Marco Faltelli, Giacomo Belocchi, Francesco Quaglia, Salvatore Pontarelli, Giuseppe Bianchi
The increasing performance requirements of modern applications place a significant burden on software-based packet processing. Most of today's software input/output accelerations achieve high performance at the expense of reserving CPU resources dedicated to continuously poll the Network Interface Card. This is specifically the case with DPDK (Data Plane Development Kit), probably the most widely used framework for software-based packet processing today. The approach presented in this paper, descriptively called Metronome, has the dual goals of providing CPU utilization proportional to the load, and allowing flexible sharing of CPU resources between I/O tasks and applications. Metronome replaces DPDK's continuous polling with an intermittent sleep&wake mode, and revolves around a new multi-threaded operation, which improves service continuity. Since the proposed operation trades CPU usage with buffering delay, we propose an analytical model devised to dynamically adapt the sleep&wake parameters to the actual traffic load, meanwhile providing a target average latency. Our experimental results show a significant reduction of the CPU cycles, improvements in power usage, and robustness to CPU sharing even when challenged with CPU-intensive applications.
Florian Boudin, Ygor Gallina
Neural keyphrase generation models have recently attracted much interest due
to their ability to output absent keyphrases, that is, keyphrases that do not
appear in the source text. In this paper, we discuss the usefulness of absent
keyphrases from an Information Retrieval (IR) perspective, and show that the
commonly drawn distinction between present and absent keyphrases is not made
explicit enough. We introduce a finer-grained categorization scheme that sheds
more light on the impact of absent keyphrases on scientific document retrieval.
Under this scheme, we find that only a fraction (around 20%) of the words that
make up keyphrases actually serves as document expansion, but that this small
fraction of words is behind much of the gains observed in retrieval
effectiveness. We also discuss how the proposed scheme can offer a new angle to
evaluate the output of neural keyphrase generation models.
Authors' comments: Accepted at NAACL 2021
Jonathan Herzig, Thomas Müller, Syrine Krichene, Julian Martin Eisenschlos
Recent advances in open-domain QA have led to strong models based on dense
retrieval, but only focused on retrieving textual passages. In this work, we
tackle open-domain QA over tables for the first time, and show that retrieval
can be improved by a retriever designed to handle tabular context. We present
an effective pre-training procedure for our retriever and improve retrieval
quality with mined hard negatives. As relevant datasets are missing, we extract
a subset of Natural Questions (Kwiatkowski et al., 2019) into a Table QA
dataset. We find that our retriever improves retrieval results from 72.0 to
81.1 recall@10 and end-to-end QA results from 33.8 to 37.7 exact match, over a
BERT based retriever.
Authors' comments: NAACL 2021 camera ready
Onifade Olufade, Arise Abiola, Ogboo Chisom
Getting relevant information from search engines has been the heart of research works in information retrieval. Query expansion is a retrieval technique that has been studied and proved to yield positive results in relevance. Users are required to express their queries as a shortlist of words, sentences, or questions. With this short format, a huge amount of information is lost in the process of translating the information need from the actual query size since the user cannot convey all his thoughts in a few words. This mostly leads to poor query representation which contributes to undesired retrieval effectiveness. This loss of information has made the study of query expansion technique a strong area of study. This research work focuses on two methods of retrieval for both tweet-length queries and sentence-length queries. Two algorithms have been proposed and the implementation is expected to produce a better relevance retrieval model than most state-the-art relevance models.
Qingyao Ai, Brendan O Connor, W. Bruce Croft
Traditional statistical retrieval models often treat each document as a whole. In many cases, however, a document is relevant to a query only because a small part of it contain the targeted information. In this work, we propose a neural passage model (NPM) that uses passage-level information to improve the performance of ad-hoc retrieval. Instead of using a single window to extract passages, our model automatically learns to weight passages with different granularities in the training process. We show that the passage-based document ranking paradigm from previous studies can be directly derived from our neural framework. Also, our experiments on a TREC collection showed that the NPM can significantly outperform the existing passage-based retrieval models.
Luis Welbanks, Nikku Madhusudhan
Atmospheric retrievals of exoplanetary transmission spectra provide important
constraints on various properties such as chemical abundances, cloud/haze
properties, and characteristic temperatures, at the day-night atmospheric
terminator. To date, most spectra have been observed for giant exoplanets due
to which retrievals typically assume H-rich atmospheres. However, recent
observations of mini-Neptunes/super-Earths, and the promise of upcoming
facilities including JWST, call for a new generation of retrievals that can
address a wide range of atmospheric compositions and related complexities. Here
we report Aurora, a next-generation atmospheric retrieval framework that builds
upon state-of-the-art architectures and incorporates the following key
advancements: a) a generalised compositional retrieval allowing for H-rich and
H-poor atmospheres, b) a generalised prescription for inhomogeneous
clouds/hazes, c) multiple Bayesian inference algorithms for high-dimensional
retrievals, d) modular considerations for refraction, forward scattering, and
Mie-scattering, and e) noise modeling functionalities. We demonstrate Aurora on
current and/or synthetic observations of hot Jupiter HD209458b, mini-Neptune
K218b, and rocky exoplanet TRAPPIST1d. Using current HD209458b spectra, we
demonstrate the robustness of our framework and cloud/haze prescription against
assumptions of H-rich/H-poor atmospheres, improving on previous treatments.
Using real and synthetic spectra of K218b, we demonstrate the agnostic approach
to confidently constrain its bulk atmospheric composition and obtain precise
abundance estimates. For TRAPPIST1d, 10 JWST NIRSpec transits can enable
identification of the main atmospheric component for cloud-free CO$_2$-rich and
N$_2$-rich atmospheres, and abundance constraints on trace gases including
initial indications of O$_3$ if present at enhanced levels ($\sim$10-100x Earth
levels).
Authors' comments: Accepted for publication in ApJ
Tianyu Zhao, Qiaojun Feng, Sai Jadhav, Nikolay Atanasov
This paper considers online object-level mapping using partial point-cloud
observations obtained online in an unknown environment. We develop and approach
for fully Convolutional Object Retrieval and Symmetry-AIded Registration
(CORSAIR). Our model extends the Fully Convolutional Geometric Features model
to learn a global object-shape embedding in addition to local point-wise
features from the point-cloud observations. The global feature is used to
retrieve a similar object from a category database, and the local features are
used for robust pose registration between the observed and the retrieved
object. Our formulation also leverages symmetries, present in the object
shapes, to obtain promising local-feature pairs from different symmetry classes
for matching. We present results from synthetic and real-world datasets with
different object categories to verify the robustness of our method.
Authors' comments: 8 pages, 8 figures
Chunbin Gu, Jiajun Bu, Xixi Zhou, Chengwei Yao, Dongfang Ma, Zhi Yu, Xifeng Yan
In this paper, we study the cross-modal image retrieval, where the inputs
contain a source image plus some text that describes certain modifications to
this image and the desired image. Prior work usually uses a three-stage
strategy to tackle this task: 1) extract the features of the inputs; 2) fuse
the feature of the source image and its modified text to obtain fusion feature;
3) learn a similarity metric between the desired image and the source image +
modified text by using deep metric learning. Since classical image/text
encoders can learn the useful representation and common pair-based loss
functions of distance metric learning are enough for cross-modal retrieval,
people usually improve retrieval accuracy by designing new fusion networks.
However, these methods do not successfully handle the modality gap caused by
the inconsistent distribution and representation of the features of different
modalities, which greatly influences the feature fusion and similarity
learning. To alleviate this problem, we adopt the contrastive self-supervised
learning method Deep InforMax (DIM) to our approach to bridge this gap by
enhancing the dependence between the text, the image, and their fusion.
Specifically, our method narrows the modality gap between the text modality and
the image modality by maximizing mutual information between their not exactly
semantically identical representation. Moreover, we seek an effective common
subspace for the semantically same fusion feature and desired image's feature
by utilizing Deep InforMax between the low-level layer of the image encoder and
the high-level layer of the fusion network. Extensive experiments on three
large-scale benchmark datasets show that we have bridged the modality gap
between different modalities and achieve state-of-the-art retrieval
performance.
Authors' comments: 35 pages,7 figures, Submitted to Neuralcomputing
Alex Jones, Derry Tanti Wijaya
Obtaining high-quality parallel corpora is of paramount importance for training NMT systems. However, as many language pairs lack adequate gold-standard training data, a popular approach has been to mine so-called "pseudo-parallel" sentences from paired documents in two languages. In this paper, we outline some problems with current methods, propose computationally economical solutions to those problems, and demonstrate success with novel methods on the Tatoeba similarity search benchmark and on a downstream task, namely NMT. We uncover the effect of resource-related factors (i.e. how much monolingual/bilingual data is available for a given language) on the optimal choice of bitext mining approach, and echo problems with the oft-used BUCC dataset that have been observed by others. We make the code and data used for our experiments publicly available.
Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Ying Shan, Bing Li, Ying Deng, Weiming Hu
Due to the rapid emergence of short videos and the requirement for content
understanding and creation, the video captioning task has received increasing
attention in recent years. In this paper, we convert traditional video
captioning task into a new paradigm, \ie, Open-book Video Captioning, which
generates natural language under the prompts of video-content-relevant
sentences, not limited to the video itself. To address the open-book video
captioning problem, we propose a novel Retrieve-Copy-Generate network, where a
pluggable video-to-text retriever is constructed to retrieve sentences as hints
from the training corpus effectively, and a copy-mechanism generator is
introduced to extract expressions from multi-retrieved sentences dynamically.
The two modules can be trained end-to-end or separately, which is flexible and
extensible. Our framework coordinates the conventional retrieval-based methods
with orthodox encoder-decoder methods, which can not only draw on the diverse
expressions in the retrieved sentences but also generate natural and accurate
content of the video. Extensive experiments on several benchmark datasets show
that our proposed approach surpasses the state-of-the-art performance,
indicating the effectiveness and promising of the proposed paradigm in the task
of video captioning.
Authors' comments: Accepted by CVPR 2021
Jayaprakash A, Abhishek, Rishabh Dabral, Ganesh Ramakrishnan, Preethi Jyothi
Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings where the data is relatively small (10% of the large scale MSR-VTT) to cover the rather complex audio-visual embedding space. In this context, we introduce Rudder - a multilingual video-text retrieval dataset that includes audio and textual captions in Marathi, Hindi, Tamil, Kannada, Malayalam and Telugu. Furthermore, we propose to compensate for data scarcity by using domain knowledge to augment supervision. To this end, in addition to the conventional three samples of a triplet (anchor, positive, and negative), we introduce a fourth term - a partial - to define a differential margin based partialorder loss. The partials are heuristically sampled such that they semantically lie in the overlap zone between the positives and the negatives, thereby resulting in broader embedding coverage. Our proposals consistently outperform the conventional max-margin and triplet losses and improve the state-of-the-art on MSR-VTT and DiDeMO datasets. We report benchmark results on Rudder while also observing significant gains using the proposed partial order loss, especially when the language specific retrieval models are jointly trained by availing the cross-lingual alignment across the language-specific datasets.
Hailong Ning, Bin Zhao, Yuan Yuan
With the development of earth observation technology, massive amounts of remote sensing (RS) images are acquired. To find useful information from these images, cross-modal RS image-voice retrieval provides a new insight. This paper aims to study the task of RS image-voice retrieval so as to search effective information from massive amounts of RS data. Existing methods for RS image-voice retrieval rely primarily on the pairwise relationship to narrow the heterogeneous semantic gap between images and voices. However, apart from the pairwise relationship included in the datasets, the intra-modality and non-paired inter-modality relationships should also be taken into account simultaneously, since the semantic consistency among non-paired representations plays an important role in the RS image-voice retrieval task. Inspired by this, a semantics-consistent representation learning (SCRL) method is proposed for RS image-voice retrieval. The main novelty is that the proposed method takes the pairwise, intra-modality, and non-paired inter-modality relationships into account simultaneously, thereby improving the semantic consistency of the learned representations for the RS image-voice retrieval. The proposed SCRL method consists of two main steps: 1) semantics encoding and 2) semantics-consistent representation learning. Firstly, an image encoding network is adopted to extract high-level image features with a transfer learning strategy, and a voice encoding network with dilated convolution is devised to obtain high-level voice features. Secondly, a consistent representation space is conducted by modeling the three kinds of relationships to narrow the heterogeneous semantic gap and learn semantics-consistent representations across two modalities. Extensive experimental results on three challenging RS image-voice datasets show the effectiveness of the proposed method.
Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, Xueqi Cheng
Multi-stage ranking pipelines have been a practical solution in modern search
systems, where the first-stage retrieval is to return a subset of candidate
documents, and latter stages attempt to re-rank those candidates. Unlike
re-ranking stages going through quick technique shifts during past decades, the
first-stage retrieval has long been dominated by classical term-based models.
Unfortunately, these models suffer from the vocabulary mismatch problem, which
may block re-ranking stages from relevant documents at the very beginning.
Therefore, it has been a long-term desire to build semantic models for the
first-stage retrieval that can achieve high recall efficiently. Recently, we
have witnessed an explosive growth of research interests on the first-stage
semantic retrieval models. We believe it is the right time to survey current
status, learn from existing methods, and gain some insights for future
development. In this paper, we describe the current landscape of the
first-stage retrieval models under a unified framework to clarify the
connection between classical term-based retrieval methods, early semantic
retrieval methods and neural semantic retrieval methods. Moreover, we identify
some open challenges and envision some future directions, with the hope of
inspiring more researches on these important yet less investigated topics.
Authors' comments: Accepted by TOIS
Youfa Li, Yaoshuai Ma, Deguang Han
While frequency-resolved optical gating (FROG) is widely used in characterizing the ultrafast pulse in optics, analytic signals are often considered in time-frequency analysis and signal processing, especially when extracting instantaneous features of events. In this paper we examine the phase retrieval (PR) problem of analytic signals in $\Bbb{C}^N$ by their FROG measurements. After establishing the ambiguity of the FROG-PR of analytic signals, we found that the FROG-PR of analytic signals of even lengths is different from that of analytic signals of odd lengths, and it is also different from the case of $B$-bandlimited signals with $B \leq N/2$. The existing approach to bandlimited signals can be applied to analytic signals of odd lengths, but it does not apply to the even length case. With the help of two relaxed FROG-PR problems and a translation technique, we develop an approach to FROG-PR for the analytic signals of even lengths, and prove that in this case the generic analytic signals can be uniquely (up to the ambiguity) determined by their $(3N/2+1)$ FROG measurements.
Eleni Partalidou, Despina Christou, Grigorios Tsoumakas
Entity Linking (EL) seeks to align entity mentions in text to entries in a
knowledge-base and is usually comprised of two phases: candidate generation and
candidate ranking. While most methods focus on the latter, it is the candidate
generation phase that sets an upper bound to both time and accuracy performance
of the overall EL system. This work's contribution is a significant improvement
in candidate generation which thus raises the performance threshold for EL, by
generating candidates that include the gold entity in the least candidate set
(top-K). We propose a simple approach that efficiently embeds mention-entity
pairs in dense space through a BERT-based bi-encoder. Specifically, we extend
(Wu et al., 2020) by introducing a new pooling function and incorporating
entity type side-information. We achieve a new state-of-the-art 84.28% accuracy
on top-50 candidates on the Zeshel dataset, compared to the previous 82.06% on
the top-64 of (Wu et al., 2020). We report the results from extensive
experimentation using our proposed model on both seen and unseen entity
datasets. Our results suggest that our method could be a useful complement to
existing EL approaches.
Authors' comments: 8 pages, 2 figures
Xiaodan Li, Jinfeng Li, Yuefeng Chen, Shaokai Ye, Yuan He, Shuhui Wang, Hang Su, Hui Xue
We study the query-based attack against image retrieval to evaluate its robustness against adversarial examples under the black-box setting, where the adversary only has query access to the top-k ranked unlabeled images from the database. Compared with query attacks in image classification, which produce adversaries according to the returned labels or confidence score, the challenge becomes even more prominent due to the difficulty in quantifying the attack effectiveness on the partial retrieved list. In this paper, we make the first attempt in Query-based Attack against Image Retrieval (QAIR), to completely subvert the top-k retrieval results. Specifically, a new relevance-based loss is designed to quantify the attack effects by measuring the set similarity on the top-k retrieval results before and after attacks and guide the gradient optimization. To further boost the attack efficiency, a recursive model stealing method is proposed to acquire transferable priors on the target model and generate the prior-guided gradients. Comprehensive experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting. The attack evaluations on the real-world visual search engine show that it successfully deceives a commercial system such as Bing Visual Search with 98% attack success rate by only 33 queries on average.
Yixing Fan, Jiafeng Guo, Xinyu Ma, Ruqing Zhang, Yanyan Lan, Xueqi Cheng
Relevance plays a central role in information retrieval (IR), which has received extensive studies starting from the 20th century. The definition and the modeling of relevance has always been critical challenges in both information science and computer science research areas. Along with the debate and exploration on relevance, IR has already become a core task in many real-world applications, such as Web search engines, question answering systems, conversational bots, and so on. While relevance acts as a unified concept in all these retrieval tasks, the inherent definitions are quite different due to the heterogeneity of these tasks. This raises a question to us: Do these different forms of relevance really lead to different modeling focuses? To answer this question, in this work, we conduct an empirical study on relevance modeling in three representative IR tasks, i.e., document retrieval, answer retrieval, and response retrieval. Specifically, we attempt to study the following two questions: 1) Does relevance modeling in these tasks really show differences in terms of natural language understanding (NLU)? We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question. 2) If there do exist differences, how can we leverage the findings to enhance the relevance modeling? We proposed three intervention methods to investigate how to leverage different modeling focuses of relevance to improve these IR tasks. We believe the way we study the problem as well as our findings would be beneficial to the IR community.
Mark Hamilton, Scott Lundberg, Lei Zhang, Stephanie Fu, William T. Freeman
Visual search, recommendation, and contrastive similarity learning power technologies that impact billions of users worldwide. Modern model architectures can be complex and difficult to interpret, and there are several competing techniques one can use to explain a search engine's behavior. We show that the theory of fair credit assignment provides a $\textit{unique}$ axiomatic solution that generalizes several existing recommendation- and metric-explainability techniques in the literature. Using this formalism, we show when existing approaches violate "fairness" and derive methods that sidestep these shortcomings and naturally handle counterfactual information. More specifically, we show existing approaches implicitly approximate second-order Shapley-Taylor indices and extend CAM, GradCAM, LIME, SHAP, SBSM, and other methods to search engines. These extensions can extract pairwise correspondences between images from trained $\textit{opaque-box}$ models. We also introduce a fast kernel-based method for estimating Shapley-Taylor indices that require orders of magnitude fewer function evaluations to converge. Finally, we show that these game-theoretic measures yield more consistent explanations for image similarity architectures.
Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng
Ranked list truncation is of critical importance in a variety of professional information retrieval applications such as patent search or legal search. The goal is to dynamically determine the number of returned documents according to some user-defined objectives, in order to reach a balance between the overall utility of the results and user efforts. Existing methods formulate this task as a sequential decision problem and take some pre-defined loss as a proxy objective, which suffers from the limitation of local decision and non-direct optimization. In this work, we propose a global decision based truncation model named AttnCut, which directly optimizes user-defined objectives for the ranked list truncation. Specifically, we take the successful transformer architecture to capture the global dependency within the ranked list for truncation decision, and employ the reward augmented maximum likelihood (RAML) for direct optimization. We consider two types of user-defined objectives which are of practical usage. One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search. Empirical results over the Robust04 and MQ2007 datasets demonstrate the effectiveness of our approach as compared with the state-of-the-art baselines.
Jesús Andrés Portillo-Quintero, José Carlos Ortiz-Bayliss, Hugo Terashima-Marín
Video Retrieval is a challenging task where a text query is matched to a
video or vice versa. Most of the existing approaches for addressing such a
problem rely on annotations made by the users. Although simple, this approach
is not always feasible in practice. In this work, we explore the application of
the language-image model, CLIP, to obtain video representations without the
need for said annotations. This model was explicitly trained to learn a common
space where images and text can be compared. Using various techniques described
in this document, we extended its application to videos, obtaining
state-of-the-art results on the MSR-VTT and MSVD benchmarks.
Authors' comments: 10 pages, 1 figure, submitted to Mexican Conference for Pattern
Recognition (MCPR 2021); corrected results section and added model
specifications