Timo Breuer, Christin Katharina Kreutz, Philipp Schaer, Dirk Tunger
Digital libraries in the scientific domain provide users access to a wide
range of information to satisfy their diverse information needs. Here, ranking
results play a crucial role in users' satisfaction. Exploiting bibliometric
metadata, e.g., publications' citation counts or bibliometric indicators in
general, for automatically identifying the most relevant results can boost
retrieval performance. This work proposes bibliometric data fusion, which
enriches existing systems' results by incorporating bibliometric metadata such
as citations or altmetrics. Our results on three biomedical retrieval
benchmarks from TREC Precision Medicine (TREC-PM) show that bibliometric data
fusion is a promising approach to improve retrieval performance in terms of
normalized Discounted Cumulated Gain (nDCG) and Average Precision (AP), at the
cost of the Precision at 10 (P@10) rate. Patient users especially profit from
this lightweight, data-sparse technique that applies to any digital library.
Authors' comments: 10 pages + references, conference paper accepted at JCDL'23
Lucas Georges Gabriel Charpentier, Sondre Wold, David Samuel, Egil Rønningstad
Retrieval-based language models are increasingly employed in
question-answering tasks. These models search in a corpus of documents for
relevant information instead of having all factual knowledge stored in its
parameters, thereby enhancing efficiency, transparency, and adaptability. We
develop the first Norwegian retrieval-based model by adapting the REALM
framework and evaluating it on various tasks. After training, we also separate
the language model, which we call the reader, from the retriever components,
and show that this can be fine-tuned on a range of downstream tasks. Results
show that retrieval augmented language modeling improves the reader's
performance on extractive question-answering, suggesting that this type of
training improves language models' general ability to use context and that this
does not happen at the expense of other abilities such as part-of-speech
tagging, dependency parsing, named entity recognition, and lemmatization. Code,
trained models, and data are made publicly available.
Authors' comments: Accepted for NoDaLiDa 2023, main conference
Yen-Chieh Lien, Hamed Zamani, W. Bruce Croft
Neural ranking models (NRMs) have demonstrated effective performance in several information retrieval (IR) tasks. However, training NRMs often requires large-scale training data, which is difficult and expensive to obtain. To address this issue, one can train NRMs via weak supervision, where a large dataset is automatically generated using an existing ranking model (called the weak labeler) for training NRMs. Weakly supervised NRMs can generalize from the observed data and significantly outperform the weak labeler. This paper generalizes this idea through an iterative re-labeling process, demonstrating that weakly supervised models can iteratively play the role of weak labeler and significantly improve ranking performance without using manually labeled data. The proposed Generalized Weak Supervision (GWS) solution is generic and orthogonal to the ranking model architecture. This paper offers four implementations of GWS: self-labeling, cross-labeling, joint cross- and self-labeling, and greedy multi-labeling. GWS also benefits from a query importance weighting mechanism based on query performance prediction methods to reduce noise in the generated training data. We further draw a theoretical connection between self-labeling and Expectation-Maximization. Our experiments on two passage retrieval benchmarks suggest that all implementations of GWS lead to substantial improvements compared to weak supervision in all cases.
Kévin Deturck, Parantapa Goswami, Damien Nouvel, Frédérique Segond
In this paper, we present our participation to CLEF MC2 2018 edition for the task 2 Mining opinion argumentation. It consists in detecting the most argumentative and diverse Tweets about some festivals in English and French from a massive multilingual collection. We measure argumentativity of a Tweet computing the amount of argumentation compounds it contains. We consider argumentation compounds as a combination between opinion expression and its support with facts and a particular structuration. Regarding diversity, we consider the amount of festival aspects covered by Tweets. An initial step filters the original dataset to fit the language and topic requirements of the task. Then, we compute and integrate linguistic descriptors to detect claims and their respective justifications in Tweets. The final step extracts the most diverse arguments by clustering Tweets according to their textual content and selecting the most argumentative ones from each cluster. We conclude the paper describing the different ways we combined the descriptors among the different runs we submitted and discussing their results.
Si Sun, Yida Lu, Shi Yu, Xiangyang Li, Zhonghua Li, Zhao Cao, Zhiyuan Liu, Deiming Ye et al.
Few-shot dense retrieval (DR) aims to effectively generalize to novel search
scenarios by learning a few samples. Despite its importance, there is little
study on specialized datasets and standardized evaluation protocols. As a
result, current methods often resort to random sampling from supervised
datasets to create "few-data" setups and employ inconsistent training
strategies during evaluations, which poses a challenge in accurately comparing
recent progress. In this paper, we propose a customized FewDR dataset and a
unified evaluation benchmark. Specifically, FewDR employs class-wise sampling
to establish a standardized "few-shot" setting with finely-defined classes,
reducing variability in multiple sampling rounds. Moreover, the dataset is
disjointed into base and novel classes, allowing DR models to be continuously
trained on ample data from base classes and a few samples in novel classes.
This benchmark eliminates the risk of novel class leakage, providing a reliable
estimation of the DR model's few-shot ability. Our extensive empirical results
reveal that current state-of-the-art DR models still face challenges in the
standard few-shot scene. Our code and data will be open-sourced at
https://github.com/OpenMatch/ANCE-Tele.
Authors' comments: Work in progress
Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin et al.
Conventional document retrieval techniques are mainly based on the index-retrieve paradigm. It is challenging to optimize pipelines based on this paradigm in an end-to-end manner. As an alternative, generative retrieval represents documents as identifiers (docid) and retrieves documents by generating docids, enabling end-to-end modeling of document retrieval tasks. However, it is an open question how one should define the document identifiers. Current approaches to the task of defining document identifiers rely on fixed rule-based docids, such as the title of a document or the result of clustering BERT embeddings, which often fail to capture the complete semantic information of a document. We propose GenRet, a document tokenization learning method to address the challenge of defining document identifiers for generative retrieval. GenRet learns to tokenize documents into short discrete representations (i.e., docids) via a discrete auto-encoding approach. Three components are included in GenRet: (i) a tokenization model that produces docids for documents; (ii) a reconstruction model that learns to reconstruct a document based on a docid; and (iii) a sequence-to-sequence retrieval model that generates relevant document identifiers directly for a designated query. By using an auto-encoding framework, GenRet learns semantic docids in a fully end-to-end manner. We also develop a progressive training scheme to capture the autoregressive nature of docids and to stabilize training. We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets to assess the effectiveness of GenRet. GenRet establishes the new state-of-the-art on the NQ320K dataset. Especially, compared to generative retrieval baselines, GenRet can achieve significant improvements on the unseen documents. GenRet also outperforms comparable baselines on MS MARCO and BEIR, demonstrating the method's generalizability.
Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu
3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation.
Esteban Marquer, Miguel Couceiro
Analogical inference is a remarkable capability of human reasoning, and has been used to solve hard reasoning tasks. Analogy based reasoning (AR) has gained increasing interest from the artificial intelligence community and has shown its potential in multiple machine learning tasks such as classification, decision making and recommendation with competitive results. We propose a deep learning (DL) framework to address and tackle two key tasks in AR: analogy detection and solving. The framework is thoroughly tested on the Siganalogies dataset of morphological analogical proportions (APs) between words, and shown to outperform symbolic approaches in many languages. Previous work have explored the behavior of the Analogy Neural Network for classification (ANNc) on analogy detection and of the Analogy Neural Network for retrieval (ANNr) on analogy solving by retrieval, as well as the potential of an autoencoder (AE) for analogy solving by generating the solution word. In this article we summarize these findings and we extend them by combining ANNr and the AE embedding model, and checking the performance of ANNc as an retrieval method. The combination of ANNr and AE outperforms the other approaches in almost all cases, and ANNc as a retrieval method achieves competitive or better performance than 3CosMul. We conclude with general guidelines on using our framework to tackle APs with DL.
Abhay Zala, Jaemin Cho, Satwik Kottur, Xilun Chen, Barlas Oğuz, Yasher Mehdad, Mohit Bansal
There is growing interest in searching for information from large video
corpora. Prior works have studied relevant tasks, such as text-based video
retrieval, moment retrieval, video summarization, and video captioning in
isolation, without an end-to-end setup that can jointly search from video
corpora and generate summaries. Such an end-to-end setup would allow for many
interesting applications, e.g., a text-based search that finds a relevant video
from a video corpus, extracts the most relevant moment from that video, and
segments the moment into important steps with captions. To address this, we
present the HiREST (HIerarchical REtrieval and STep-captioning) dataset and
propose a new benchmark that covers hierarchical information retrieval and
visual/textual stepwise summarization from an instructional video corpus.
HiREST consists of 3.4K text-video pairs from an instructional video dataset,
where 1.1K videos have annotations of moment spans relevant to text query and
breakdown of each moment into key instruction steps with caption and timestamps
(totaling 8.6K step captions). Our hierarchical benchmark consists of video
retrieval, moment retrieval, and two novel moment segmentation and step
captioning tasks. In moment segmentation, models break down a video moment into
instruction steps and identify start-end boundaries. In step captioning, models
generate a textual summary for each step. We also present starting point
task-specific and end-to-end joint baseline models for our new benchmark. While
the baseline models show some promising results, there still exists large room
for future improvement by the community. Project website:
https://hirest-cvpr2023.github.io
Authors' comments: CVPR 2023 (15 pages; the first two authors contributed equally;
Project website: https://hirest-cvpr2023.github.io)
Luca Zancato, Alessandro Achille, Tian Yu Liu, Matthew Trager, Pramuditha Perera, Stefano Soatto
We introduce Train/Test-Time Adaptation with Retrieval (${\rm T^3AR}$), a method to adapt models both at train and test time by means of a retrieval module and a searchable pool of external samples. Before inference, ${\rm T^3AR}$ adapts a given model to the downstream task using refined pseudo-labels and a self-supervised contrastive objective function whose noise distribution leverages retrieved real samples to improve feature adaptation on the target data manifold. The retrieval of real images is key to ${\rm T^3AR}$ since it does not rely solely on synthetic data augmentations to compensate for the lack of adaptation data, as typically done by other adaptation algorithms. Furthermore, thanks to the retrieval module, our method gives the user or service provider the possibility to improve model adaptation on the downstream task by incorporating further relevant data or to fully remove samples that may no longer be available due to changes in user preference after deployment. First, we show that ${\rm T^3AR}$ can be used at training time to improve downstream fine-grained classification over standard fine-tuning baselines, and the fewer the adaptation data the higher the relative improvement (up to 13%). Second, we apply ${\rm T^3AR}$ for test-time adaptation and show that exploiting a pool of external images at test-time leads to more robust representations over existing methods on DomainNet-126 and VISDA-C, especially when few adaptation data are available (up to 8%).
Alexander Black, Simon Jenni, Tu Bui, Md. Mehrab Tanjim, Stefano Petrangeli, Ritwik Sinha, Viswanathan Swaminathan, John Collomosse
We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual descriptor and scalable search over adaptively chunked video content. A transformer-based alignment module then refines the temporal localization of the query fragment within the matched video. A space-time comparator module identifies regions of manipulation between aligned content, invariant to any changes due to any residual temporal misalignments or artifacts arising from non-editorial changes of the content. Robustly matching video to a trusted source enables conclusions to be drawn on video provenance, enabling informed trust decisions on content encountered.
Thong Nguyen, Sean MacAvaney, Andrew Yates
Learned sparse retrieval (LSR) is a family of first-stage retrieval methods that are trained to generate sparse lexical representations of queries and documents for use with an inverted index. Many LSR methods have been recently introduced, with Splade models achieving state-of-the-art performance on MSMarco. Despite similarities in their model architectures, many LSR methods show substantial differences in effectiveness and efficiency. Differences in the experimental setups and configurations used make it difficult to compare the methods and derive insights. In this work, we analyze existing LSR methods and identify key components to establish an LSR framework that unifies all LSR methods under the same perspective. We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency. We find that (1) including document term weighting is most important for a method's effectiveness, (2) including query weighting has a small positive impact, and (3) document expansion and query expansion have a cancellation effect. As a result, we show how removing query expansion from a state-of-the-art model can reduce latency significantly while maintaining effectiveness on MSMarco and TripClick benchmarks. Our code is publicly available at https://github.com/thongnt99/learned-sparse-retrieval
Xinnian Liang, Shuangzhi Wu, Hui Huang, Jiaqi Bai, Chao Bian, Zhoujun Li
Retrieval augmented methods have shown promising results in various
classification tasks. However, existing methods focus on retrieving extra
context to enrich the input, which is noise sensitive and non-expandable. In
this paper, following this line, we propose a $k$-nearest-neighbor (KNN) -based
method for retrieval augmented classifications, which interpolates the
predicted label distribution with retrieved instances' label distributions.
Different from the standard KNN process, we propose a decoupling mechanism as
we find that shared representation for classification and retrieval hurts
performance and leads to training instability. We evaluate our method on a wide
range of classification datasets. Experimental results demonstrate the
effectiveness and robustness of our proposed method. We also conduct extra
experiments to analyze the contributions of different components in our
model.\footnote{\url{https://github.com/xnliang98/knn-cls-w-decoupling}}
Authors' comments: preprint
Ryan J. MacDonald, Natasha E. Batalha
Exoplanet atmospheric retrieval is a computational technique widely used to
infer properties of planetary atmospheres from remote spectroscopic
observations. Retrieval codes typically employ Bayesian sampling algorithms or
machine learning approaches to explore the range of atmospheric properties
(e.g., chemical composition, temperature structure, aerosols) compatible with
an observed spectrum. However, despite the wide adoption of exoplanet retrieval
techniques, there is currently no systematic summary of exoplanet retrieval
codes in the literature. Here, we provide a catalogue of the atmospheric
retrieval codes published to date, alongside links to their respective code
repositories where available. Our catalogue will be continuously updated via a
Zenodo archive.
Authors' comments: 5 pages, 1 giant Table. Published in RNAAS. Live catalogue will be
updated at https://doi.org/10.5281/zenodo.7675743
Yidan Zhang, Ting Zhang, Dong Chen, Yujing Wang, Qi Chen, Xing Xie, Hao Sun, Weiwei Deng et al.
While generative modeling has become prevalent across numerous research
fields, its integration into the realm of image retrieval remains largely
unexplored and underjustified. In this paper, we present a novel methodology,
reframing image retrieval as a variant of generative modeling and employing a
sequence-to-sequence model. This approach is harmoniously aligned with the
current trend towards unification in research, presenting a cohesive framework
that allows for end-to-end differentiable searching. This, in turn, facilitates
superior performance via direct optimization techniques. The development of our
model, dubbed IRGen, addresses the critical technical challenge of converting
an image into a concise sequence of semantic units, which is pivotal for
enabling efficient and effective search. Extensive experiments demonstrate that
our model achieves state-of-the-art performance on three widely-used image
retrieval benchmarks as well as two million-scale datasets, yielding
significant improvement compared to prior competitive retrieval methods. In
addition, the notable surge in precision scores facilitated by generative
modeling presents the potential to bypass the reranking phase, which is
traditionally indispensable in practical retrieval workflows.
Authors' comments: Accepted by ECCV 2024
Abhra Chaudhuri, Ayan Kumar Bhunia, Yi-Zhe Song, Anjan Dutta
Rising concerns about privacy and anonymity preservation of deep learning
models have facilitated research in data-free learning (DFL). For the first
time, we identify that for data-scarce tasks like Sketch-Based Image Retrieval
(SBIR), where the difficulty in acquiring paired photos and hand-drawn sketches
limits data-dependent cross-modal learning algorithms, DFL can prove to be a
much more practical paradigm. We thus propose Data-Free (DF)-SBIR, where,
unlike existing DFL problems, pre-trained, single-modality classification
models have to be leveraged to learn a cross-modal metric-space for retrieval
without access to any training data. The widespread availability of pre-trained
classification models, along with the difficulty in acquiring paired
photo-sketch datasets for SBIR justify the practicality of this setting. We
present a methodology for DF-SBIR, which can leverage knowledge from models
independently trained to perform classification on photos and sketches. We
evaluate our model on the Sketchy, TU-Berlin, and QuickDraw benchmarks,
designing a variety of baselines based on state-of-the-art DFL literature, and
observe that our method surpasses all of them by significant margins. Our
method also achieves mAPs competitive with data-dependent approaches, all the
while requiring no training data. Implementation is available at
\url{https://github.com/abhrac/data-free-sbir}.
Authors' comments: Computer Vision and Pattern Recognition (CVPR) 2023
Tongwen Huang, Xihua Li, Chao Yi, Xuemin Zhao, Yunbo Cao
When students make a mistake in an exercise, they can consolidate it by
``similar exercises'' which have the same concepts, purposes and methods.
Commonly, for a certain subject and study stage, the size of the exercise bank
is in the range of millions to even tens of millions, how to find similar
exercises for a given exercise becomes a crucial technical problem. Generally,
we can assign a variety of explicit labels to the exercise, and then query
through the labels, but the label annotation is time-consuming, laborious and
costly, with limited precision and granularity, so it is not feasible. In
practice, we define ``similar exercises'' as a retrieval process of finding a
set of similar exercises based on recall, ranking and re-rank procedures,
called the \textbf{FSE} problem (Finding similar exercises). Furthermore,
comprehensive representation of the semantic information of exercises was
obtained through representation learning. In addition to the reasonable
architecture, we also explore what kind of tasks are more conducive to the
learning of exercise semantic information from pre-training and supervised
learning. It is difficult to annotate similar exercises and the annotation
consistency among experts is low. Therefore this paper also provides solutions
to solve the problem of low-quality annotated data. Compared with other
methods, this paper has obvious advantages in both architecture rationality and
algorithm precision, which now serves the daily teaching of hundreds of
schools.
Authors' comments: 37th Conference on AAAI 2023 Artificial Intelligence for
Education(AI4Edu)
Feng He, Qi Wang, Zhifan Feng, Wenbin Jiang, Yajuan Lv, Yong zhu, Xiao Tan
Video retrieval is becoming increasingly important owing to the rapid
emergence of videos on the Internet. The dominant paradigm for video retrieval
learns video-text representations by pushing the distance between the
similarity of positive pairs and that of negative pairs apart from a fixed
margin. However, negative pairs used for training are sampled randomly, which
indicates that the semantics between negative pairs may be related or even
equivalent, while most methods still enforce dissimilar representations to
decrease their similarity. This phenomenon leads to inaccurate supervision and
poor performance in learning video-text representations.
While most video retrieval methods overlook that phenomenon, we propose an
adaptive margin changed with the distance between positive and negative pairs
to solve the aforementioned issue. First, we design the calculation framework
of the adaptive margin, including the method of distance measurement and the
function between the distance and the margin. Then, we explore a novel
implementation called "Cross-Modal Generalized Self-Distillation" (CMGSD),
which can be built on the top of most video retrieval models with few
modifications. Notably, CMGSD adds few computational overheads at train time
and adds no computational overhead at test time. Experimental results on three
widely used datasets demonstrate that the proposed method can yield
significantly better performance than the corresponding backbone model, and it
outperforms state-of-the-art methods by a large margin.
Authors' comments: Accepted by SIGIR 2021
Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu et al.
Protein language models have excelled in a variety of tasks, ranging from structure prediction to protein engineering. However, proteins are highly diverse in functions and structures, and current state-of-the-art models including the latest version of AlphaFold rely on Multiple Sequence Alignments (MSA) to feed in the evolutionary knowledge. Despite their success, heavy computational overheads, as well as the de novo and orphan proteins remain great challenges in protein representation learning. In this work, we show that MSAaugmented models inherently belong to retrievalaugmented methods. Motivated by this finding, we introduce Retrieved Sequence Augmentation(RSA) for protein representation learning without additional alignment or pre-processing. RSA links query protein sequences to a set of sequences with similar structures or properties in the database and combines these sequences for downstream prediction. We show that protein language models benefit from the retrieval enhancement on both structure prediction and property prediction tasks, with a 5% improvement on MSA Transformer on average while being 373 times faster. In addition, we show that our model can transfer to new protein domains better and outperforms MSA Transformer on de novo protein prediction. Our study fills a much-encountered gap in protein prediction and brings us a step closer to demystifying the domain knowledge needed to understand protein sequences. Code is available on https://github.com/HKUNLP/RSA.
Christopher Richardson, Sudipta Kar, Anjishnu Kumar, Anand Ramachandran, Omar Zia Khan, Zeynab Raeesy, Abhinav Sethy
Open domain conversational agents can answer a broad range of targeted
queries. However, the sequential nature of interaction with these systems makes
knowledge exploration a lengthy task which burdens the user with asking a chain
of well phrased questions. In this paper, we present a retrieval based system
and associated dataset for predicting the next questions that the user might
have. Such a system can proactively assist users in knowledge exploration
leading to a more engaging dialog. The retrieval system is trained on a dataset
which contains ~14K multi-turn information-seeking conversations with a valid
follow-up question and a set of invalid candidates. The invalid candidates are
generated to simulate various syntactic and semantic confounders such as
paraphrases, partial entity match, irrelevant entity, and ASR errors. We use
confounder specific techniques to simulate these negative examples on the
OR-QuAC dataset and develop a dataset called the Follow-up Query Bank
(FQ-Bank). Then, we train ranking models on FQ-Bank and present results
comparing supervised and unsupervised approaches. The results suggest that we
can retrieve the valid follow-ups by ranking them in higher positions compared
to confounders, but further knowledge grounding can improve ranking
performance.
Authors' comments: EACL 2023