Jikai Wang, Abdolnaser Ghazagh, Sonam Smitha Ravi, Stefan Baumbach, Benjamin Dannecker, Michael Scharun, Dominik Bauer, Stefan Nolte et al.
A standardized phase retrieval algorithm is presented and applied to an
industry-grade high-energy ultrashort pulsed laser to uncover its spatial phase
distribution. We describe in detail how to modify the well-known algorithm in
order to characterize particularly strong light sources from intensity
measurements only. With complete information about the optical field of the
unknown light source at hand, virtual back propagation can reveal weak points
in the light path such as apertures or damaged components.
Authors' comments: Published Applied Optics manuscript, 12 pages, 10 figures
Yuxuan Lei, Jianxun Lian, Jing Yao, Mingqi Wu, Defu Lian, Xing Xie
This paper addresses the gap between general-purpose text embeddings and the
specific demands of item retrieval tasks. We demonstrate the shortcomings of
existing models in capturing the nuances necessary for zero-shot performance on
item retrieval tasks. To overcome these limitations, we propose generate
in-domain dataset from ten tasks tailored to unlocking models' representation
ability for item retrieval. Our empirical studies demonstrate that fine-tuning
embedding models on the dataset leads to remarkable improvements in a variety
of retrieval tasks. We also illustrate the practical application of our refined
model in a conversational setting, where it enhances the capabilities of
LLM-based Recommender Agents like Chat-Rec. Our code is available at
https://github.com/microsoft/RecAI.
Authors' comments: 4 pages,1 figures, 4 tables
Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang et al.
Advancements in model algorithms, the growth of foundational models, and
access to high-quality datasets have propelled the evolution of Artificial
Intelligence Generated Content (AIGC). Despite its notable successes, AIGC
still faces hurdles such as updating knowledge, handling long-tail data,
mitigating data leakage, and managing high training and inference costs.
Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to
address such challenges. In particular, RAG introduces the information
retrieval process, which enhances the generation process by retrieving relevant
objects from available data stores, leading to higher accuracy and better
robustness. In this paper, we comprehensively review existing efforts that
integrate RAG technique into AIGC scenarios. We first classify RAG foundations
according to how the retriever augments the generator, distilling the
fundamental abstractions of the augmentation methodologies for various
retrievers and generators. This unified perspective encompasses all RAG
scenarios, illuminating advancements and pivotal technologies that help with
potential future progress. We also summarize additional enhancements methods
for RAG, facilitating effective engineering and implementation of RAG systems.
Then from another view, we survey on practical applications of RAG across
different modalities and tasks, offering valuable references for researchers
and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss
the limitations of current RAG systems, and suggest potential directions for
future research. Github: https://github.com/PKU-DAIR/RAG-Survey.
Authors' comments: Citing 377 papers, 28 pages, 1 table, 12 figures. Project:
https://github.com/PKU-DAIR/RAG-Survey
Jiebin Zhang, Eugene J. Yu, Qinyu Chen, Chenhao Xiong, Dawei Zhu, Han Qian, Mingbo Song, Xiaoguang Li et al.
In today's fast-paced world, the growing demand to quickly generate comprehensive and accurate Wikipedia documents for emerging events is both crucial and challenging. However, previous efforts in Wikipedia generation have often fallen short of meeting real-world requirements. Some approaches focus solely on generating segments of a complete Wikipedia document, while others overlook the importance of faithfulness in generation or fail to consider the influence of the pre-training corpus. In this paper, we simulate a real-world scenario where structured full-length Wikipedia documents are generated for emergent events using input retrieved from web sources. To ensure that Large Language Models (LLMs) are not trained on corpora related to recently occurred events, we select events that have taken place recently and introduce a new benchmark Wiki-GenBen, which consists of 309 events paired with their corresponding retrieved web pages for generating evidence. Additionally, we design a comprehensive set of systematic evaluation metrics and baseline methods, to evaluate the capability of LLMs in generating factual full-length Wikipedia documents. The data and code are open-sourced at WikiGenBench.
Julian Reichinger, Thomas Krismayer, Jan Rellermeyer
Modern, large scale monitoring systems have to process and store vast amounts
of log data in near real-time. At query time the systems have to find relevant
logs based on the content of the log message using support structures that can
scale to these amounts of data while still being efficient to use. We present
our novel Compressed Probabilistic Retrieval algorithm (COPR), capable of
answering Multi-Set Multi-Membership-Queries, that can be used as an
alternative to existing indexing structures for streamed log data. In our
experiments, COPR required up to 93% less storage space than the tested
state-of-the-art inverted index and had up to four orders of magnitude less
false-positives than the tested state-of-the-art membership sketch.
Additionally, COPR achieved up to 250 times higher query throughput than the
tested inverted index and up to 240 times higher query throughput than the
tested membership sketch.
Authors' comments: 14 pages, 8 figures
Bin Li, Ye Shi, Qian Yu, Jingya Wang
Unsupervised cross-domain image retrieval (UCIR) aims to retrieve images sharing the same category across diverse domains without relying on labeled data. Prior approaches have typically decomposed the UCIR problem into two distinct tasks: intra-domain representation learning and cross-domain feature alignment. However, these segregated strategies overlook the potential synergies between these tasks. This paper introduces ProtoOT, a novel Optimal Transport formulation explicitly tailored for UCIR, which integrates intra-domain feature representation learning and cross-domain alignment into a unified framework. ProtoOT leverages the strengths of the K-means clustering method to effectively manage distribution imbalances inherent in UCIR. By utilizing K-means for generating initial prototypes and approximating class marginal distributions, we modify the constraints in Optimal Transport accordingly, significantly enhancing its performance in UCIR scenarios. Furthermore, we incorporate contrastive learning into the ProtoOT framework to further improve representation learning. This encourages local semantic consistency among features with similar semantics, while also explicitly enforcing separation between features and unmatched prototypes, thereby enhancing global discriminativeness. ProtoOT surpasses existing state-of-the-art methods by a notable margin across benchmark datasets. Notably, on DomainNet, ProtoOT achieves an average P@200 enhancement of 18.17%, and on Office-Home, it demonstrates a P@15 improvement of 3.83%.
Wenyuan Zhao, Yu Shin Huang, Ruida Zhou, Chao Tian
We study the problem of weakly private information retrieval (PIR) when there
is heterogeneity in servers' trustfulness under the maximal leakage (Max-L)
metric and mutual information (MI) metric. A user wishes to retrieve a desired
message from N non-colluding servers efficiently, such that the identity of the
desired message is not leaked in a significant manner; however, some servers
can be more trustworthy than others. We propose a code construction for this
setting and optimize the probability distribution for this construction. For
the Max-L metric, it is shown that the optimal probability allocation for the
proposed scheme essentially separates the delivery patterns into two parts: a
completely private part that has the same download overhead as the
capacity-achieving PIR code, and a non-private part that allows complete
privacy leakage but has no download overhead by downloading only from the most
trustful server. The optimal solution is established through a sophisticated
analysis of the underlying convex optimization problem, and a reduction between
the homogeneous setting and the heterogeneous setting. For the MI metric, the
homogeneous case is studied first for which the code can be optimized with an
explicit probability assignment, while a closed-form solution becomes
intractable for the heterogeneous case. Numerical results are provided for both
cases to corroborate the theoretical analysis.
Authors' comments: 23 pages 3 figures. arXiv admin note: text overlap with
arXiv:2205.01611
Thong Nguyen, Mariya Hendriksen, Andrew Yates, Maarten de Rijke
Learned sparse retrieval (LSR) is a family of neural methods that encode
queries and documents into sparse lexical vectors that can be indexed and
retrieved efficiently with an inverted index. We explore the application of LSR
to the multi-modal domain, with a focus on text-image retrieval. While LSR has
seen success in text retrieval, its application in multimodal retrieval remains
underexplored. Current approaches like LexLIP and STAIR require complex
multi-step training on massive datasets. Our proposed approach efficiently
transforms dense vectors from a frozen dense model into sparse lexical vectors.
We address issues of high dimension co-activation and semantic deviation
through a new training algorithm, using Bernoulli random variables to control
query expansion. Experiments with two dense models (BLIP, ALBEF) and two
datasets (MSCOCO, Flickr30k) show that our proposed algorithm effectively
reduces co-activation and semantic deviation. Our best-performing sparsified
model outperforms state-of-the-art text-image LSR models with a shorter
training time and lower GPU memory requirements. Our approach offers an
effective solution for training LSR retrieval models in multimodal settings.
Our code and model checkpoints are available at
github.com/thongnt99/lsr-multimodal
Authors' comments: 17 pages, accepted as a full paper at ECIR 2024
Raunak Manekar, Elisa Negrini, Minh Pham, Daniel Jacobs, Jaideep Srivastava, Stanley J. Osher, Jianwei Miao
Phase retrieval (PR) is fundamentally important in scientific imaging and is crucial for nanoscale techniques like coherent diffractive imaging (CDI). Low radiation dose imaging is essential for applications involving radiation-sensitive samples. However, most PR methods struggle in low-dose scenarios due to high shot noise. Recent advancements in optical data acquisition setups, such as in-situ CDI, have shown promise for low-dose imaging, but they rely on a time series of measurements, making them unsuitable for single-image applications. Similarly, data-driven phase retrieval techniques are not easily adaptable to data-scarce situations. Zero-shot deep learning methods based on pre-trained and implicit generative priors have been effective in various imaging tasks but have shown limited success in PR. In this work, we propose low-dose deep image prior (LoDIP), which combines in-situ CDI with the power of implicit generative priors to address single-image low-dose phase retrieval. Quantitative evaluations demonstrate LoDIP's superior performance in this task and its applicability to real experimental scenarios.
Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang et al.
In video-text retrieval, most existing methods adopt the dual-encoder
architecture for fast retrieval, which employs two individual encoders to
extract global latent representations for videos and texts. However, they face
challenges in capturing fine-grained semantic concepts. In this work, we
propose the UNIFY framework, which learns lexicon representations to capture
fine-grained semantics and combines the strengths of latent and lexicon
representations for video-text retrieval. Specifically, we map videos and texts
into a pre-defined lexicon space, where each dimension corresponds to a
semantic concept. A two-stage semantics grounding approach is proposed to
activate semantically relevant dimensions and suppress irrelevant dimensions.
The learned lexicon representations can thus reflect fine-grained semantics of
videos and texts. Furthermore, to leverage the complementarity between latent
and lexicon representations, we propose a unified learning scheme to facilitate
mutual learning via structure sharing and self-distillation. Experimental
results show our UNIFY framework largely outperforms previous video-text
retrieval methods, with 4.8% and 8.2% Recall@1 improvement on MSR-VTT and
DiDeMo respectively.
Authors' comments: Accepted to LREC-COLING 2024
Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo
Despite the critical need to align search targets with users' intention, retrievers often only prioritize query information without delving into the users' intended search context. Enhancing the capability of retrievers to understand intentions and preferences of users, akin to language model instructions, has the potential to yield more aligned search targets. Prior studies restrict the application of instructions in information retrieval to a task description format, neglecting the broader context of diverse and evolving search scenarios. Furthermore, the prevailing benchmarks utilized for evaluation lack explicit tailoring to assess instruction-following ability, thereby hindering progress in this field. In response to these limitations, we propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks. Our approach focuses on user-aligned instructions tailored to each query instance, reflecting the diverse characteristics inherent in real-world search scenarios. Through experimental analysis, we observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts. This underscores potential overfitting issues inherent in constructing retrievers trained on existing instruction-aware retrieval datasets.
Wanqing Cui, Keping Bi, Jiafeng Guo, Xueqi Cheng
Since commonsense information has been recorded significantly less frequently than its existence, language models pre-trained by text generation have difficulty to learn sufficient commonsense knowledge. Several studies have leveraged text retrieval to augment the models' commonsense ability. Unlike text, images capture commonsense information inherently but little effort has been paid to effectively utilize them. In this work, we propose a novel Multi-mOdal REtrieval (MORE) augmentation framework, to leverage both text and images to enhance the commonsense ability of language models. Extensive experiments on the Common-Gen task have demonstrated the efficacy of MORE based on the pre-trained models of both single and multiple modalities.
Danyang Hou, Liang Pang, Huawei Shen, Xueqi Cheng
Video corpus moment retrieval~(VCMR) is a new video retrieval task aimed at
retrieving a relevant moment from a large corpus of untrimmed videos using a
natural language text as query. The relevance between the video and query is
partial, mainly evident in two aspects: (1) Scope: The untrimmed video contains
information-rich frames, and not all are relevant to the query. Strong
correlation is typically observed only within the relevant moment, emphasizing
the importance of capturing key content. (2) Modality: The relevance of query
to different modalities varies; action descriptions align more with the visual
elements, while character conversations are more related to textual
information. Recognizing and addressing these modality-specific nuances is
crucial for effective retrieval in VCMR. However, existing methods often treat
all video contents equally, leading to sub-optimal moment retrieval. We argue
that effectively capturing the partial relevance between the query and video is
essential for the VCMR task. To this end, we propose a Partial Relevance
Enhanced Model~(PREM) to improve VCMR. VCMR involves two sub-tasks: video
retrieval and moment localization. To align with their distinct objectives, we
implement specialized partial relevance enhancement strategies. For video
retrieval, we introduce a multi-modal collaborative video retriever, generating
distinct query representations tailored for different modalities by
modality-specific pooling, ensuring a more effective match. For moment
localization, we propose the focus-then-fuse moment localizer, utilizing
modality-specific gates to capture essential content, followed by fusing
multi-modal information for moment localization. Experimental results on TVR
and DiDeMo datasets show that the proposed model outperforms the baselines,
achieving a new state-of-the-art of VCMR.
Authors' comments: 10 pages, 6 figures, 6 tables
Quanyu Long, Yue Deng, LeiLei Gan, Wenya Wang, Sinno Jialin Pan
Dense retrievers and retrieval-augmented language models have been widely used in various NLP applications. Despite being designed to deliver reliable and secure outcomes, the vulnerability of retrievers to potential attacks remains unclear, raising concerns about their security. In this paper, we introduce a novel scenario where the attackers aim to covertly disseminate targeted misinformation, such as hate speech or advertisement, through a retrieval system. To achieve this, we propose a perilous backdoor attack triggered by grammar errors in dense passage retrieval. Our approach ensures that attacked models can function normally for standard queries but are manipulated to return passages specified by the attacker when users unintentionally make grammatical mistakes in their queries. Extensive experiments demonstrate the effectiveness and stealthiness of our proposed attack method. When a user query is error-free, our model consistently retrieves accurate information while effectively filtering out misinformation from the top-k results. However, when a query contains grammar errors, our system shows a significantly higher success rate in fetching the targeted content.
Minju Seo, Jinheon Baek, James Thorne, Sung Ju Hwang
Despite large successes of recent language models on diverse tasks, they suffer from severe performance degeneration in low-resource settings with limited training data available. Many existing works tackle this problem by generating synthetic data from the training data and then training models on them, recently using Large Language Models (LLMs). However, in low-resource settings, the amount of seed data samples to use for data augmentation is very small, which makes generated samples suboptimal and less diverse. To tackle this challenge, we propose a novel method that augments training data by incorporating a wealth of examples from other datasets, along with the given training data. Specifically, we first retrieve the relevant instances from other datasets, such as their input-output pairs or contexts, based on their similarities with the given seed data, and then prompt LLMs to generate new samples with the contextual information within and across the original and retrieved samples. This approach can ensure that the generated data is not only relevant but also more diverse than what could be achieved using the limited seed data alone. We validate our proposed Retrieval-Augmented Data Augmentation (RADA) framework on multiple datasets under low-resource settings of training and test-time data augmentation scenarios, on which it outperforms existing LLM-powered data augmentation baselines.
Pei-Syuan Wang, Hung-Hsuan Chen
This paper details an empirical investigation into using Graph Contrastive Learning (GCL) to generate mathematical equation representations, a critical aspect of Mathematical Information Retrieval (MIR). Our findings reveal that this simple approach consistently exceeds the performance of the current leading formula retrieval model, TangentCFT. To support ongoing research and development in this field, we have made our source code accessible to the public at https://github.com/WangPeiSyuan/GCL-Formula-Retrieval/.
Xiaoyue Wang, Jianyou Wang, Weili Cao, Kaicheng Wang, Ramamohan Paturi, Leon Bergen
We present the Benchmark of Information Retrieval (IR) tasks with Complex Objectives (BIRCO). BIRCO evaluates the ability of IR systems to retrieve documents given multi-faceted user objectives. The benchmark's complexity and compact size make it suitable for evaluating large language model (LLM)-based information retrieval systems. We present a modular framework for investigating factors that may influence LLM performance on retrieval tasks, and identify a simple baseline model which matches or outperforms existing approaches and more complex alternatives. No approach achieves satisfactory performance on all benchmark tasks, suggesting that stronger models and new retrieval protocols are necessary to address complex user needs.
Stephan Goerttler, Fei He, Min Wu
Heat diffusion describes the process by which heat flows from areas with
higher temperatures to ones with lower temperatures. This concept was
previously adapted to graph structures, whereby heat flows between nodes of a
graph depending on the graph topology. Here, we combine the graph heat equation
with the stochastic heat equation, which ultimately yields a model for
multivariate time signals on a graph. We show theoretically how the model can
be used to directly compute the diffusion-based connectivity structure from
multivariate signals. Unlike other connectivity measures, our heat model-based
approach is inherently multivariate and yields an absolute scaling factor,
namely the graph thermal diffusivity, which captures the extent of heat-like
graph propagation in the data. On two datasets, we show how the graph thermal
diffusivity can be used to characterise Alzheimer's disease. We find that the
graph thermal diffusivity is lower for Alzheimer's patients than healthy
controls and correlates with dementia scores, suggesting structural impairment
in patients in line with previous findings.
Authors' comments: 4 pages, 1 figure, conference paper
Zhen Yang, Zhou Shao, Yuxiao Dong, Jie Tang
Negative sampling stands as a pivotal technique in dense retrieval, essential
for training effective retrieval models and significantly impacting retrieval
performance. While existing negative sampling methods have made commendable
progress by leveraging hard negatives, a comprehensive guiding principle for
constructing negative candidates and designing negative sampling distributions
is still lacking. To bridge this gap, we embark on a theoretical analysis of
negative sampling in dense retrieval. This exploration culminates in the
unveiling of the quasi-triangular principle, a novel framework that elucidates
the triangular-like interplay between query, positive document, and negative
document. Fueled by this guiding principle, we introduce TriSampler, a
straightforward yet highly effective negative sampling method. The keypoint of
TriSampler lies in its ability to selectively sample more informative negatives
within a prescribed constrained region. Experimental evaluation show that
TriSampler consistently attains superior retrieval performance across a diverse
of representative retrieval models.
Authors' comments: 9 pages, 4 figures
Yizheng Huang, Jimmy Huang
The rapid advancement of artificial intelligence (AI) has highlighted ChatGPT
as a pivotal technology in the field of information retrieval (IR).
Distinguished from its predecessors, ChatGPT offers significant benefits that
have attracted the attention of both the industry and academic communities.
While some view ChatGPT as a groundbreaking innovation, others attribute its
success to the effective integration of product development and market
strategies. The emergence of ChatGPT, alongside GPT-4, marks a new phase in
Generative AI, generating content that is distinct from training examples and
exceeding the capabilities of the prior GPT-3 model by OpenAI. Unlike the
traditional supervised learning approach in IR tasks, ChatGPT challenges
existing paradigms, bringing forth new challenges and opportunities regarding
text quality assurance, model bias, and efficiency. This paper seeks to examine
the impact of ChatGPT on IR tasks and offer insights into its potential future
developments.
Authors' comments: Survey Paper