Suchitra Krishnaswamy, Fabian Schule, Laura Ares, Vladyslav Dyachuk, Michael Stefszky, Benjamin Brecht, Christine Silberhorn, Jan Sperling
We utilize click-counting theory for the reconstruction of photon statistics.
Our approach employs an analytic pseudo-inversion method to estimate photon
counts from measured click counts. A reconfigurable time-bin multiplexing,
click-counting detector is set up that renders it possible to alter the
photon-number resolution as needed. A detector tomography is carried out,
yielding vital measurement features, such as quantum efficiencies, cross-talk
rates, etc. We gauge the success of the pseudo-inversion by applying the Mandel
and binomial parameters, resulting in an additional interpretation of these
parameters for the discrimination of distinct quantum statistics. In addition,
we apply a loss deconvolution technique to account for detection losses.
Authors' comments: 10 pages, 8 figures
Di Wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, Xiaofei Ma
Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion. However, the invariable use of retrieval in existing methods exposes issues in both efficiency and robustness, with a large proportion of the retrieved contexts proving unhelpful or harmful to code language models (code LMs). To tackle the challenges, this paper proposes a selective RAG framework where retrieval is avoided when unnecessary. To power this framework, we design a self-supervised learning approach that enables a code LM to accurately self-evaluate whether retrieval can improve its output quality and robustly leverage the potentially noisy retrieved contexts. Using this LM as both the selective retrieval policy and the generation model, our framework consistently outperforms the state-of-the-art prompting with an invariable retrieval approach on diverse benchmarks including RepoEval, CrossCodeEval, and a new benchmark. Meanwhile, our selective retrieval strategy results in strong efficiency improvements by as much as 70% inference speedup without harming the performance. We demonstrate that our framework effectively accommodates different generation models, retrievers, and programming languages. These advancements position our framework as an important step towards more accurate and efficient repository-level code completion.
H. R. Tizhoosh
The paper reviews the state-of-the-art of foundation models, LLMs, generative
AI, information retrieval and CBIR in digital pathology
Authors' comments: This is the preprint of a book chapter to appear in "Artificial
Intelligence in Pathology" by Stanley Cohen and Chhavi Chauhan
Jennifer Hsia, Afreen Shaikh, Zhiruo Wang, Graham Neubig
Retrieval-augmented generation (RAG) can significantly improve the performance of language models (LMs) by providing additional context for tasks such as document-based question answering (DBQA). However, the effectiveness of RAG is highly dependent on its configuration. To systematically find the optimal configuration, we introduce RAGGED, a framework for analyzing RAG configurations across various DBQA tasks. Using the framework, we discover distinct LM behaviors in response to varying context quantities, context qualities, and retrievers. For instance, while some models are robust to noisy contexts, monotonically performing better with more contexts, others are more noise-sensitive and can effectively use only a few contexts before declining in performance. This framework also provides a deeper analysis of these differences by evaluating the LMs' sensitivity to signal and noise under specific context quality conditions. Using RAGGED, researchers and practitioners can derive actionable insights about how to optimally configure their RAG systems for their specific question-answering tasks.
Yuanhang Zheng, Peng Li, Wei Liu, Yang Liu, Jian Luan, Bin Wang
Tool learning aims to extend the capabilities of large language models (LLMs)
with external tools. A major challenge in tool learning is how to support a
large number of tools, including unseen tools. To address this challenge,
previous studies have proposed retrieving suitable tools for the LLM based on
the user query. However, previously proposed methods do not consider the
differences between seen and unseen tools, nor do they take the hierarchy of
the tool library into account, which may lead to suboptimal performance for
tool retrieval. Therefore, to address the aforementioned issues, we propose
ToolRerank, an adaptive and hierarchy-aware reranking method for tool retrieval
to further refine the retrieval results. Specifically, our proposed ToolRerank
includes Adaptive Truncation, which truncates the retrieval results related to
seen and unseen tools at different positions, and Hierarchy-Aware Reranking,
which makes retrieval results more concentrated for single-tool queries and
more diverse for multi-tool queries. Experimental results show that ToolRerank
can improve the quality of the retrieval results, leading to better execution
results generated by the LLM.
Authors' comments: This paper is accepted for LREC-COLING 2024
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song
In this paper, we propose a novel abstraction-aware sketch-based image
retrieval framework capable of handling sketch abstraction at varied levels.
Prior works had mainly focused on tackling sub-factors such as drawing style
and order, we instead attempt to model abstraction as a whole, and propose
feature-level and retrieval granularity-level designs so that the system builds
into its DNA the necessary means to interpret abstraction. On learning
abstraction-aware features, we for the first-time harness the rich semantic
embedding of pre-trained StyleGAN model, together with a novel
abstraction-level mapper that deciphers the level of abstraction and
dynamically selects appropriate dimensions in the feature matrix
correspondingly, to construct a feature matrix embedding that can be freely
traversed to accommodate different levels of abstraction. For granularity-level
abstraction understanding, we dictate that the retrieval model should not treat
all abstraction-levels equally and introduce a differentiable surrogate Acc.@q
loss to inject that understanding into the system. Different to the
gold-standard triplet loss, our Acc.@q loss uniquely allows a sketch to
narrow/broaden its focus in terms of how stringent the evaluation should be -
the more abstract a sketch, the less stringent (higher q). Extensive
experiments depict our method to outperform existing state-of-the-arts in
standard SBIR tasks along with challenging scenarios like early retrieval,
forensic sketch-photo matching, and style-invariant retrieval.
Authors' comments: Accepted in CVPR 2024. Project page available at
https://subhadeepkoley.github.io/AbstractAway
Asal Rouhafzay, Nadia Baaziz, Mohand Said Allili
In this paper, we propose a new framework for improving Content Based Image
Retrieval (CBIR) for texture images. This is achieved by using a new image
representation based on the RCT-Plus transform which is a novel variant of the
Redundant Contourlet transform that extracts a richer directional information
in the image. Moreover, the process of image search is improved through a
learning-based approach where the images of the database are classified using
an adapted similarity metric to the statistical modeling of the RCT-Plus
transform. A query is then first classified to select the best texture class
after which the retained class images are ranked to select top ones. By this,
we have achieved significant improvements in the retrieval rates compared to
previous CBIR schemes.
Authors' comments: 14 pages, 6 figures, The 25th International Conference on Image
Processing, Computer Vision, & Pattern Recognition (IPCV'21: July 26-29,
2021, USA)
Haochen Han, Qinghua Zheng, Guang Dai, Minnan Luo, Jingdong Wang
Collecting well-matched multimedia datasets is crucial for training
cross-modal retrieval models. However, in real-world scenarios, massive
multimodal data are harvested from the Internet, which inevitably contains
Partially Mismatched Pairs (PMPs). Undoubtedly, such semantical irrelevant data
will remarkably harm the cross-modal retrieval performance. Previous efforts
tend to mitigate this problem by estimating a soft correspondence to
down-weight the contribution of PMPs. In this paper, we aim to address this
challenge from a new perspective: the potential semantic similarity among
unpaired samples makes it possible to excavate useful knowledge from mismatched
pairs. To achieve this, we propose L2RM, a general framework based on Optimal
Transport (OT) that learns to rematch mismatched pairs. In detail, L2RM aims to
generate refined alignments by seeking a minimal-cost transport plan across
different modalities. To formalize the rematching idea in OT, first, we propose
a self-supervised cost function that automatically learns from explicit
similarity-cost mapping relation. Second, we present to model a partial OT
problem while restricting the transport among false positives to further boost
refined alignments. Extensive experiments on three benchmarks demonstrate our
L2RM significantly improves the robustness against PMPs for existing models.
The code is available at https://github.com/hhc1997/L2RM.
Authors' comments: CVPR 2024
Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Bernie Wang, Tim Kraska
Retrieval-augmented generation (RAG) can enhance the generation quality of large language models (LLMs) by incorporating external token databases. However, retrievals from large databases can constitute a substantial portion of the overall generation time, particularly when retrievals are periodically performed to align the retrieved content with the latest states of generation. In this paper, we introduce PipeRAG, a novel algorithm-system co-design approach to reduce generation latency and enhance generation quality. PipeRAG integrates (1) pipeline parallelism to enable concurrent retrieval and generation processes, (2) flexible retrieval intervals to maximize the efficiency of pipeline parallelism, and (3) a performance model to automatically balance retrieval quality and latency based on the generation states and underlying hardware. Our evaluation shows that, by combining the three aforementioned methods, PipeRAG achieves up to 2.6$\times$ speedup in end-to-end generation latency while improving generation quality. These promising results showcase the effectiveness of co-designing algorithms with underlying systems, paving the way for the adoption of PipeRAG in future RAG systems.
Lixu Wang, Xinyu Du, Qi Zhu
Cross-domain retrieval (CDR), as a crucial tool for numerous technologies, is
finding increasingly broad applications. However, existing efforts face several
major issues, with the most critical being the need for accurate supervision,
which often demands costly resources and efforts. Cutting-edge studies focus on
achieving unsupervised CDR but typically assume that the category spaces across
domains are identical, an assumption that is often unrealistic in real-world
scenarios. This is because only through dedicated and comprehensive analysis
can the category spaces of different domains be confirmed as identical, which
contradicts the premise of unsupervised scenarios. Therefore, in this work, we
introduce the problem of Universal Unsupervised Cross-Domain Retrieval (U^2CDR)
for the first time and design a two-stage semantic feature learning framework
to address it. In the first stage, a cross-domain unified prototypical
structure is established under the guidance of an instance-prototype-mixed
contrastive loss and a semantic-enhanced loss, to counteract category space
differences. In the second stage, through a modified adversarial training
mechanism, we ensure minimal changes for the established prototypical structure
during domain alignment, enabling more accurate nearest-neighbor searching.
Extensive experiments across multiple datasets and scenarios, including closet,
partial, and open-set CDR, demonstrate that our approach significantly
outperforms existing state-of-the-art CDR works and some potentially effective
studies from other topics in solving U^2CDR challenges.
Authors' comments: 18 pages, 4 figures, ongoing work
Bin Liang, Bingbing Wang, Zhixin Bai, Qiwei Lang, Mingwei Sun, Kaiheng Hou, Lanjun Zhou, Ruifeng Xu et al.
Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the current utterance. However, in the real-world scenario, using stickers to express what we want to say rather than as a supplement to our words only is also important. Therefore, in this paper, we create a new dataset for sticker retrieval in conversation, called \textbf{StickerInt}, where stickers are used to reply to previous conversations or supplement our words\footnote{We believe that the release of this dataset will provide a more complete paradigm than existing work for the research of sticker retrieval in the open-domain online conversation.}. Based on the created dataset, we present a simple yet effective framework for sticker retrieval in conversation based on the learning of intention and the cross-modal relationships between conversation context and stickers, coined as \textbf{Int-RA}. Specifically, we first devise a knowledge-enhanced intention predictor to introduce the intention information into the conversation representations. Subsequently, a relation-aware sticker selector is devised to retrieve the response sticker via cross-modal relationships. Extensive experiments on the created dataset show that the proposed model achieves state-of-the-art performance in sticker retrieval\footnote{The dataset and source code of this work are released at \url{https://github.com/HITSZ-HLT/Int-RA}.}.
L. S. Dolan, E. J. W de Mooij, C. A. Watson, D. G. Jackson
Stellar activity and planetary effects induce radial velocity (RV) offsets
and cause temporal distortions in the shape of the stellar line profile. Hence,
accurately probing the stellar line profile offers a wealth of information on
both the star itself and any orbiting planets. Typically, Cross-Correlation
Functions (CCFs) are used as a proxy for the stellar line profile. The shape of
CCFs, however, can be distorted by line blending and aliasing limiting the
stellar and planetary physics that can be probed from them. Least-squares
deconvolution (LSD) offers an alternative that directly fits the mean line
profile of the spectrum to produce a high-precision profile. In this paper, we
introduce our novel method ACID (Accurate Continuum fItting and Deconvolution)
that builds on LSD techniques by simultaneously fitting the spectral continuum
and line profile as well as performing LSD in effective optical depth. Tests on
model data revealed ACID can accurately identify and correct the spectral
continuum to retrieve an injected line profile. ACID was also applied to
archival HARPS data obtained during the transit of HD189733b. The application
of the Reloaded Rossiter-McLaughlin technique to both ACID profiles and HARPS
CCFs shows ACID residual profiles improved the out-of-line RMS by over 5%
compared to CCFs. Furthermore, ACID profiles are shown to exhibit a Voigt
profile shape that better describes the expected profile shape of the stellar
line profile. This improved representation shows that ACID better preserves the
stellar and planetary physics encoded in the stellar line profile shape for
slow rotating stars.
Authors' comments: 16 pages, 13 figures. Accepted for Publication in Monthly Notices of
the Royal Astronomical Society
Savvas Constantinou, Nikku Madhusudhan
JWST observations are leading to important new insights into exoplanetary
atmospheres through transmission spectroscopy. In order to harness the full
potential of the broad spectral range and high sensitivity of JWST, atmospheric
retrievals of exoplanets require a high level of robustness and accuracy in the
underlying models. We present the VIRA retrieval framework which implements a
range of modelling and inference capabilities motivated by early JWST
observations of exoplanet transmission spectra. This includes three
complementary approaches to modelling atmospheric composition, three
atmospheric aerosol models, including a physically-motivated Mie scattering
approach, and consideration of correlated noise. VIRA enables a cascading
retrieval architecture involving a sequence of retrievals with increasing
sophistication. We demonstrate VIRA using a JWST transmission spectrum of the
hot Saturn WASP-39 b in the $\sim$1-5 $\mu$m range. In addition to confirming
prior chemical inferences, we retrieve molecular abundances for H$_2$O, CO,
CO$_2$, SO$_2$ and H$_2$S, resulting in super-solar elemental abundances of
log(O/H)=$-2.0\pm0.2$, log(C/H)=$-2.1\pm0.2$ and log(S/H)=$-3.6\pm0.2$, along
with C/O and S/O ratios of $0.83^{+0.05}_{-0.07}$ and
$0.029^{+0.012}_{-0.009}$, respectively, in the free chemistry case. The
abundances correspond to $20.1^{+10.5}_{-8.1}\times$,
$28.2^{+16.3}_{-12.1}\times$ and $20.8^{+10.3}_{-7.5}\times$ solar values for
O/H, C/H and S/H, respectively, compared to C/H $= 8.67\pm0.35 \times$ solar
for Saturn. Our results demonstrate how JWST transmission spectroscopy combined
with retrieval frameworks like VIRA can measure multi-elemental abundances for
giant exoplanets and enable comparative characterisation with solar system
planets.
Authors' comments: Accepted for publication in MNRAS
Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, Julian McAuley
This paper introduces BLaIR, a series of pretrained sentence embedding models specialized for recommendation scenarios. BLaIR is trained to learn correlations between item metadata and potential natural language context, which is useful for retrieving and recommending items. To pretrain BLaIR, we collect Amazon Reviews 2023, a new dataset comprising over 570 million reviews and 48 million items from 33 categories, significantly expanding beyond the scope of previous versions. We evaluate the generalization ability of BLaIR across multiple domains and tasks, including a new task named complex product search, referring to retrieving relevant items given long, complex natural language contexts. Leveraging large language models like ChatGPT, we correspondingly construct a semi-synthetic evaluation set, Amazon-C4. Empirical results on the new task, as well as conventional retrieval and recommendation tasks, demonstrate that BLaIR exhibit strong text and item representation capacity. Our datasets, code, and checkpoints are available at: https://github.com/hyp1231/AmazonReviews2023.
Chao-Wei Huang, Chen-An Li, Tsu-Yuan Hsu, Chen-Yu Hsu, Yun-Nung Chen
Dense retrieval methods have demonstrated promising performance in
multilingual information retrieval, where queries and documents can be in
different languages. However, dense retrievers typically require a substantial
amount of paired data, which poses even greater challenges in multilingual
scenarios. This paper introduces UMR, an Unsupervised Multilingual dense
Retriever trained without any paired data. Our approach leverages the sequence
likelihood estimation capabilities of multilingual language models to acquire
pseudo labels for training dense retrievers. We propose a two-stage framework
which iteratively improves the performance of multilingual dense retrievers.
Experimental results on two benchmark datasets show that UMR outperforms
supervised baselines, showcasing the potential of training multilingual
retrievers without paired data, thereby enhancing their practicality. Our
source code, data, and models are publicly available at
https://github.com/MiuLab/UMR
Authors' comments: Accepted to Findings of EACL 2024
Akari Asai, Zexuan Zhong, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi, Wen-tau Yih
Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, difficulty in adapting to new data distributions, and a lack of verifiability. In this position paper, we advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs. By incorporating large-scale datastores during inference, retrieval-augmented LMs can be more reliable, adaptable, and attributable. Despite their potential, retrieval-augmented LMs have yet to be widely adopted due to several obstacles: specifically, current retrieval-augmented LMs struggle to leverage helpful text beyond knowledge-intensive tasks such as question answering, have limited interaction between retrieval and LM components, and lack the infrastructure for scaling. To address these, we propose a roadmap for developing general-purpose retrieval-augmented LMs. This involves a reconsideration of datastores and retrievers, the exploration of pipelines with improved retriever-LM interaction, and significant investment in infrastructure for efficient training and inference.
Heydar Soudani, Evangelos Kanoulas, Faegheh Hasibi
Language Models (LMs) memorize a vast amount of factual knowledge, exhibiting strong performance across diverse tasks and domains. However, it has been observed that the performance diminishes when dealing with less-popular or low-frequency concepts and entities, for example in domain specific applications. The two prominent approaches to enhance the performance of LMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data. This paper explores and evaluates the impact of RAG and FT on customizing LMs in handling low-frequency entities on question answering tasks. We conduct extensive experiments on twelve LMs of varying size and type and different fine tuning, data augmentation, and retrieval models. Our findings indicate that while FT boosts the performance across entities of varying popularity, RAG surpasses FT by a large margin particularly for least popular factual knowledge. Additionally, the success of both RAG and FT approaches is amplified by improving retrieval and data augmentation techniques. Fine tuning, while beneficial for small LMs, requires extensive resources. To address this issue, we propose the new Stimulus RAG approach that surpasses the effectiveness of fine tuning based approaches, thereby eliminating the need for the costly data augmentation and fine tuning step for enriching LMs with less popular factual knowledge. The code is available at \url{https://github.com/informagi/RAGvsFT}.
Philip Feldman. James R. Foulds, Shimei Pan
Large language models (LLMs) like ChatGPT demonstrate the remarkable progress
of artificial intelligence. However, their tendency to hallucinate -- generate
plausible but false information -- poses a significant challenge. This issue is
critical, as seen in recent court cases where ChatGPT's use led to citations of
non-existent legal rulings. This paper explores how Retrieval-Augmented
Generation (RAG) can counter hallucinations by integrating external knowledge
with prompts. We empirically evaluate RAG against standard LLMs using prompts
designed to induce hallucinations. Our results show that RAG increases accuracy
in some cases, but can still be misled when prompts directly contradict the
model's pre-trained understanding. These findings highlight the complex nature
of hallucinations and the need for more robust solutions to ensure LLM
reliability in real-world applications. We offer practical recommendations for
RAG deployment and discuss implications for the development of more trustworthy
LLMs.
Authors' comments: 7 Pages, 1 Figure, 1 Table
Kangning Yin, Shihao Zou, Yuxuan Ge, Zheng Tian
Information retrieval is an ever-evolving and crucial research domain. The substantial demand for high-quality human motion data especially in online acquirement has led to a surge in human motion research works. Prior works have mainly concentrated on dual-modality learning, such as text and motion tasks, but three-modality learning has been rarely explored. Intuitively, an extra introduced modality can enrich a model's application scenario, and more importantly, an adequate choice of the extra modality can also act as an intermediary and enhance the alignment between the other two disparate modalities. In this work, we introduce LAVIMO (LAnguage-VIdeo-MOtion alignment), a novel framework for three-modality learning integrating human-centric videos as an additional modality, thereby effectively bridging the gap between text and motion. Moreover, our approach leverages a specially designed attention mechanism to foster enhanced alignment and synergistic effects among text, video, and motion modalities. Empirically, our results on the HumanML3D and KIT-ML datasets show that LAVIMO achieves state-of-the-art performance in various motion-related cross-modal retrieval tasks, including text-to-motion, motion-to-text, video-to-motion and motion-to-video.
Hui Wu, Min Wang, Wengang Zhou, Houqiang Li
Asymmetric image retrieval is a task that seeks to balance retrieval accuracy and efficiency by leveraging lightweight and large models for the query and gallery sides, respectively. The key to asymmetric image retrieval is realizing feature compatibility between different models. Despite the great progress, most existing approaches either rely on classifiers inherited from gallery models or simply impose constraints at the instance level, ignoring the structure of embedding space. In this work, we propose a simple yet effective structure similarity preserving method to achieve feature compatibility between query and gallery models. Specifically, we first train a product quantizer offline with the image features embedded by the gallery model. The centroid vectors in the quantizer serve as anchor points in the embedding space of the gallery model to characterize its structure. During the training of the query model, anchor points are shared by the query and gallery models. The relationships between image features and centroid vectors are considered as structure similarities and constrained to be consistent. Moreover, our approach makes no assumption about the existence of any labeled training data and thus can be extended to an unlimited amount of data. Comprehensive experiments on large-scale landmark retrieval demonstrate the effectiveness of our approach. Our code is released at: https://github.com/MCC-WH/SSP.