Socratis Gkelios, Yiannis Boutalis, Savvas A. Chatzichristofis
This paper introduces a plug-and-play descriptor that can be effectively adopted for image retrieval tasks without prior initialization or preparation. The description method utilizes the recently proposed Vision Transformer network while it does not require any training data to adjust parameters. In image retrieval tasks, the use of Handcrafted global and local descriptors has been very successfully replaced, over the last years, by the Convolutional Neural Networks (CNN)-based methods. However, the experimental evaluation conducted in this paper on several benchmarking datasets against 36 state-of-the-art descriptors from the literature demonstrates that a neural network that contains no convolutional layer, such as Vision Transformer, can shape a global descriptor and achieve competitive results. As fine-tuning is not required, the presented methodology's low complexity encourages adoption of the architecture as an image retrieval baseline model, replacing the traditional and well adopted CNN-based approaches and inaugurating a new era in image retrieval approaches.
Ana B. Ruescas, Martin Hieronymi, Sampsa Koponen, Kari Kallio, Gustau Camps-Valls
The coloured dissolved organic matter (CDOM) concentration is the standard
measure of humic substance in natural waters. CDOM measurements by remote
sensing is calculated using the absorption coefficient (a) at a certain
wavelength (e.g. 440nm). This paper presents a comparison of four machine
learning methods for the retrieval of CDOM from remote sensing signals:
regularized linear regression (RLR), random forest (RF), kernel ridge
regression (KRR) and Gaussian process regression (GPR). Results are compared
with the established polynomial regression algorithms. RLR is revealed as the
simplest and most efficient method, followed closely by its nonlinear
counterpart KRR.
Authors' comments: 7 pages, 4 figures
Yang Liu, Keze Wang, Haoyuan Lan, Liang Lin
Attempt to fully discover the temporal diversity for self-supervised video
representation learning, this work takes advantage of the temporal dependencies
of videos and further proposes a novel self-supervised method named Temporal
Contrastive Graph Learning (TCGL). In contrast to the existing methods that
consider the temporal dependency from a single scale, our TCGL roots in a
hybrid graph contrastive learning strategy to regard the inter-snippet and
intra-snippet temporal dependencies as self-supervision signals for temporal
representation learning. To learn multi-scale temporal dependencies, the TCGL
integrates the prior knowledge about the frame and snippet orders into graph
structures, i.e., the intra-/inter- snippet temporal contrastive graph modules.
By randomly removing edges and masking node features of the intra-snippet
graphs or inter-snippet graphs, the TCGL can generate different correlated
graph views. Then, specific contrastive modules are designed to maximize the
agreement between node embeddings in different views. To learn the global
context representation and recalibrate the channel-wise features adaptively, we
introduce an adaptive video snippet order prediction module, which leverages
the relational knowledge among video snippets to predict the actual snippet
orders. Experimental results demonstrate the superiority of our TCGL over the
state-of-the-art methods on large-scale action recognition and video retrieval
benchmarks.
Authors' comments: 11 pages, 6 figures
Omar Khattab, Christopher Potts, Matei Zaharia
Multi-hop reasoning (i.e., reasoning across two or more documents) is a key
ingredient for NLP models that leverage large corpora to exhibit broad
knowledge. To retrieve evidence passages, multi-hop models must contend with a
fast-growing search space across the hops, represent complex queries that
combine multiple information needs, and resolve ambiguity about the best order
in which to hop between training passages. We tackle these problems via Baleen,
a system that improves the accuracy of multi-hop retrieval while learning
robustly from weak training signals in the many-hop setting. To tame the search
space, we propose condensed retrieval, a pipeline that summarizes the retrieved
passages after each hop into a single compact context. To model complex
queries, we introduce a focused late interaction retriever that allows
different parts of the same query representation to match disparate relevant
passages. Lastly, to infer the hopping dependencies among unordered training
passages, we devise latent hop ordering, a weak-supervision strategy in which
the trained retriever itself selects the sequence of hops. We evaluate Baleen
on retrieval for two-hop question answering and many-hop claim verification,
establishing state-of-the-art performance.
Authors' comments: NeurIPS 2021 (Spotlight)
Shaobo Li, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Chengjie Sun, Zhenzhou Ji, Bingquan Liu
Collecting supporting evidence from large corpora of text (e.g., Wikipedia)
is of great challenge for open-domain Question Answering (QA). Especially, for
multi-hop open-domain QA, scattered evidence pieces are required to be gathered
together to support the answer extraction. In this paper, we propose a new
retrieval target, hop, to collect the hidden reasoning evidence from Wikipedia
for complex question answering. Specifically, the hop in this paper is defined
as the combination of a hyperlink and the corresponding outbound link document.
The hyperlink is encoded as the mention embedding which models the structured
knowledge of how the outbound link entity is mentioned in the textual context,
and the corresponding outbound link document is encoded as the document
embedding representing the unstructured knowledge within it. Accordingly, we
build HopRetriever which retrieves hops over Wikipedia to answer complex
questions. Experiments on the HotpotQA dataset demonstrate that HopRetriever
outperforms previously published evidence retrieval methods by large margins.
Moreover, our approach also yields quantifiable interpretations of the evidence
collection process.
Authors' comments: Accepted at AAAI 2021
Sangwoong Yoon, Woo Young Kang, Sungwook Jeon, SeongEun Lee, Changjin Han, Jonghun Park, Eun-Sol Kim
As a scene graph compactly summarizes the high-level content of an image in a
structured and symbolic manner, the similarity between scene graphs of two
images reflects the relevance of their contents. Based on this idea, we propose
a novel approach for image-to-image retrieval using scene graph similarity
measured by graph neural networks. In our approach, graph neural networks are
trained to predict the proxy image relevance measure, computed from
human-annotated captions using a pre-trained sentence similarity model. We
collect and publish the dataset for image relevance measured by human
annotators to evaluate retrieval algorithms. The collected dataset shows that
our method agrees well with the human perception of image similarity than other
competitive baselines.
Authors' comments: Accepted to AAAI 2021
Ivan Montero, Shayne Longpre, Ni Lao, Andrew J. Frank, Christopher DuBois
Existing methods for open-retrieval question answering in lower resource languages (LRLs) lag significantly behind English. They not only suffer from the shortcomings of non-English document retrieval, but are reliant on language-specific supervision for either the task or translation. We formulate a task setup more realistic to available resources, that circumvents document retrieval to reliably transfer knowledge from English to lower resource languages. Assuming a strong English question answering model or database, we compare and analyze methods that pivot through English: to map foreign queries to English and then English answers back to target language answers. Within this task setup we propose Reranked Multilingual Maximal Inner Product Search (RM-MIPS), akin to semantic similarity retrieval over the English training set with reranking, which outperforms the strongest baselines by 2.7% on XQuAD and 6.2% on MKQA. Analysis demonstrates the particular efficacy of this strategy over state-of-the-art alternatives in challenging settings: low-resource languages, with extensive distractor data and query distribution misalignment. Circumventing retrieval, our analysis shows this approach offers rapid answer generation to almost any language off-the-shelf, without the need for any additional training data in the target language.
Cheng Tang, Andrew Arnold
Recently, Nogueira et al. [2019] proposed a new approach to document expansion based on a neural Seq2Seq model, showing significant improvement on short text retrieval task. However, this approach needs a large amount of in-domain training data. In this paper, we show that this neural document expansion approach can be effectively adapted to standard IR tasks, where labels are scarce and many long documents are present.
Md Mustafizur Rahman, Mucahid Kutlu, Matthew Lease
Research community evaluations in information retrieval, such as NIST's Text
REtrieval Conference (TREC), build reusable test collections by pooling
document rankings submitted by many teams. Naturally, the quality of the
resulting test collection thus greatly depends on the number of participating
teams and the quality of their submitted runs. In this work, we investigate: i)
how the number of participants, coupled with other factors, affects the quality
of a test collection; and ii) whether the quality of a test collection can be
inferred prior to collecting relevance judgments from human assessors.
Experiments conducted on six TREC collections illustrate how the number of
teams interacts with various other factors to influence the resulting quality
of test collections. We also show that the reusability of a test collection can
be predicted with high accuracy when the same document collection is used for
successive years in an evaluation campaign, as is common in TREC.
Authors' comments: Accepted as a full paper at iConference 2022
Ivano Lodato, Snehal M. Shekatkar, Tian An Wong
We consider a generalization of the classical 100 Prisoner problem and its
variant, involving empty boxes, whereby winning probabilities for a team depend
on the number of attempts, as well as on the number of winners. We call this
the unconstrained 100 prisoner problem. After introducing the 3 main classes of
strategies, we define a variety of `hybrid' strategies and quantify their
winning-efficiency. Whenever analytic results are not available, we make use of
Monte Carlo simulations to estimate with high accuracy the
winning-probabilities. Based on the results obtained, we conjecture that all
strategies, except for the strategy maximizing the winning probability of the
classical (constrained) problem, converge to the random strategy under weak
conditions on the number of players or empty boxes. We conclude by commenting
on the possible applications of our results in understanding processes of
information retrieval, such as ``memory'' in living organisms.
Authors' comments: Acta Informatica
Bhaskar Mitra
Neural networks with deep architectures have demonstrated significant
performance improvements in computer vision, speech recognition, and natural
language processing. The challenges in information retrieval (IR), however, are
different from these other application areas. A common form of IR involves
ranking of documents--or short passages--in response to keyword-based queries.
Effective IR systems must deal with query-document vocabulary mismatch problem,
by modeling relationships between different query and document terms and how
they indicate relevance. Models should also consider lexical matches when the
query contains rare terms--such as a person's name or a product model
number--not seen during training, and to avoid retrieving semantically related
but irrelevant results. In many real-life IR tasks, the retrieval involves
extremely large collections--such as the document index of a commercial Web
search engine--containing billions of documents. Efficient IR methods should
take advantage of specialized IR data structures, such as inverted index, to
efficiently retrieve from large collections. Given an information need, the IR
system also mediates how much exposure an information artifact receives by
deciding whether it should be displayed, and where it should be positioned,
among other results. Exposure-aware IR systems may optimize for additional
objectives, besides relevance, such as parity of exposure for retrieved items
and content publishers. In this thesis, we present novel neural architectures
and methods motivated by the specific needs and challenges of IR tasks.
Authors' comments: PhD thesis, Univ College London (2020)
Qiuliang Ye, Yuk-Hee Chan, Michael G. Somekh, Daniel P. K. Lun
Phase retrieval with pre-defined optical masks can provide extra constraint and thus achieve improved performance. The recent progress in optimization theory demonstrates the superiority of random masks in phase retrieval algorithms. However, traditional approaches just focus on the randomness of the masks but ignore their non-bandlimited nature. When using these masks in the reconstruction process for phase retrieval, the high frequency part of the masks is often removed in the process and thus leads to degraded performance. Based on the concept of digital halftoning, this paper proposes a green noise binary masking scheme which can greatly reduce the high frequency content of the masks while fulfilling the randomness requirement. The experimental results show that the proposed green noise binary masking scheme outperform the traditional ones when using in a DMD-based coded diffraction pattern phase retrieval system.
Abdulmalik Johar
The main aim of an information retrieval system is to extract appropriate
information from an enormous collection of data based on users need. The basic
concept of the information retrieval system is that when a user sends out a
query, the system would try to generate a list of related documents ranked in
order, according to their degree of relevance. Digital unstructured Silte text
documents increase from time to time. The growth of digital text information
makes the utilization and access of the right information difficult. Thus,
developing an information retrieval system for Silte language allows searching
and retrieving relevant documents that satisfy information need of users. In
this research, we design probabilistic information retrieval system for Silte
language. The system has both indexing and searching part was created. In these
modules, different text operations such as tokenization, stemming, stop word
removal and synonym is included.
Authors' comments: 3 pages, 5 figures,2 table,3 chart
Xuanmeng Zhang, Minyue Jiang, Zhedong Zheng, Xiao Tan, Errui Ding, Yi Yang
The re-ranking approach leverages high-confidence retrieved samples to refine retrieval results, which have been widely adopted as a post-processing tool for image retrieval tasks. However, we notice one main flaw of re-ranking, i.e., high computational complexity, which leads to an unaffordable time cost for real-world applications. In this paper, we revisit re-ranking and demonstrate that re-ranking can be reformulated as a high-parallelism Graph Neural Network (GNN) function. In particular, we divide the conventional re-ranking process into two phases, i.e., retrieving high-quality gallery samples and updating features. We argue that the first phase equals building the k-nearest neighbor graph, while the second phase can be viewed as spreading the message within the graph. In practice, GNN only needs to concern vertices with the connected edges. Since the graph is sparse, we can efficiently update the vertex features. On the Market-1501 dataset, we accelerate the re-ranking processing from 89.2s to 9.4ms with one K40m GPU, facilitating the real-time post-processing. Similarly, we observe that our method achieves comparable or even better retrieval results on the other four image retrieval benchmarks, i.e., VeRi-776, Oxford-5k, Paris-6k and University-1652, with limited time cost. Our code is publicly available.
David Malmgren-Hansen, Allan Aasbjerg Nielsen, Valero Laparra, Gustau Camps- Valls
The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series provides important measurements for Numerical Weather Prediction (NWP). Retrieving accurate atmospheric parameters from the raw data provided by IASI is a large challenge, but necessary in order to use the data in NWP models. Statistical models performance is compromised because of the extremely high spectral dimensionality and the high number of variables to be predicted simultaneously across the atmospheric column. All this poses a challenge for selecting and studying optimal models and processing schemes. Earlier work has shown non-linear models such as kernel methods and neural networks perform well on this task, but both schemes are computationally heavy on large quantities of data. Kernel methods do not scale well with the number of training data, and neural networks require setting critical hyperparameters. In this work we follow an alternative pathway: we study transfer learning in convolutional neural nets (CNN s) to alleviate the retraining cost by departing from proxy solutions (either features or networks) obtained from previously trained models for related variables. We show how features extracted from the IASI data by a CNN trained to predict a physical variable can be used as inputs to another statistical method designed to predict a different physical variable at low altitude. In addition, the learned parameters can be transferred to another CNN model and obtain results equivalent to those obtained when using a CNN trained from scratch requiring only fine tuning.
David Malmgren-Hansen, Valero Laparra, Allan Aasbjerg Nielsen, Gustau Camps-Valls
In this paper we present a combined strategy for the retrieval of atmospheric profiles from infrared sounders. The approach considers the spatial information and a noise-dependent dimensionality reduction approach. The extracted features are fed into a canonical linear regression. We compare Principal Component Analysis (PCA) and Minimum Noise Fraction (MNF) for dimensionality reduction, and study the compactness and information content of the extracted features. Assessment of the results is done on a big dataset covering many spatial and temporal situations. PCA is widely used for these purposes but our analysis shows that one can gain significant improvements of the error rates when using MNF instead. In our analysis we also investigate the relationship between error rate improvements when including more spectral and spatial components in the regression model, aiming to uncover the trade-off between model complexity and error rates.
Ana B. Ruescas, Gonzalo Mateo-Garcia, Gustau Camps-Valls, Martin Hieronymi
Water quality parameters are derived applying several machine learning
regression methods on the Case2eXtreme dataset (C2X). The used data are based
on Hydrolight in-water radiative transfer simulations at Sentinel-3 OLCI
wavebands, and the application is done exclusively for absorbing waters with
high concentrations of coloured dissolved organic matter (CDOM). The regression
approaches are: regularized linear, random forest, Kernel ridge, Gaussian
process and support vector regressors. The validation is made with and an
independent simulation dataset. A comparison with the OLCI Neural Network Swarm
(ONSS) is made as well. The best approached is applied to a sample scene and
compared with the standard OLCI product delivered by EUMETSAT/ESA
Authors' comments: 8 pages, 4 figures
Gautier Izacard, Edouard Grave
The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on hand-crafted features, continuous representations based on neural networks recently obtained competitive results. A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents. In this paper, we propose a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents. Our approach leverages attention scores of a reader model, used to solve the task based on retrieved documents, to obtain synthetic labels for the retriever. We evaluate our method on question answering, obtaining state-of-the-art results.
Antoine Maillard, Florent Krzakala, Yue M. Lu, Lenka Zdeborová
We consider the phase retrieval problem, in which the observer wishes to
recover a $n$-dimensional real or complex signal $\mathbf{X}^\star$ from the
(possibly noisy) observation of $|\mathbf{\Phi} \mathbf{X}^\star|$, in which
$\mathbf{\Phi}$ is a matrix of size $m \times n$. We consider a
\emph{high-dimensional} setting where $n,m \to \infty$ with $m/n =
\mathcal{O}(1)$, and a large class of (possibly correlated) random matrices
$\mathbf{\Phi}$ and observation channels. Spectral methods are a powerful tool
to obtain approximate observations of the signal $\mathbf{X}^\star$ which can
be then used as initialization for a subsequent algorithm, at a low
computational cost. In this paper, we extend and unify previous results and
approaches on spectral methods for the phase retrieval problem. More precisely,
we combine the linearization of message-passing algorithms and the analysis of
the \emph{Bethe Hessian}, a classical tool of statistical physics. Using this
toolbox, we show how to derive optimal spectral methods for arbitrary channel
noise and right-unitarily invariant matrix $\mathbf{\Phi}$, in an automated
manner (i.e. with no optimization over any hyperparameter or preprocessing
function).
Authors' comments: 14 pages + references and appendix. v2: Version updated to match the
one accepted at MSML 2021. v3: Adding a reference to a previous work
mentioning marginal stability and its connection to Bayes-optimality
ochem Verrelst, Sara Dethier, Juan Pablo Rivera, Jordi Muñoz-Marí, Gustau Camps-Valls, José Moreno
Kernel-based machine learning regression algorithms (MLRAs) are potentially powerful methods for being implemented into operational biophysical variable retrieval schemes. However, they face difficulties in coping with large training datasets. With the increasing amount of optical remote sensing data made available for analysis and the possibility of using a large amount of simulated data from radiative transfer models (RTMs) to train kernel MLRAs, efficient data reduction techniques will need to be implemented. Active learning (AL) methods enable to select the most informative samples in a dataset. This letter introduces six AL methods for achieving optimized biophysical variable estimation with a manageable training dataset, and their implementation into a Matlab-based MLRA toolbox for semi-automatic use. The AL methods were analyzed on their efficiency of improving the estimation accuracy of leaf area index and chlorophyll content based on PROSAIL simulations. Each of the implemented methods outperformed random sampling, improving retrieval accuracy with lower sampling rates. Practically, AL methods open opportunities to feed advanced MLRAs with RTM-generated training data for development of operational retrieval models.