To read this content please select one of the options below:

LoGE: an unsupervised local-global document extension generation in information retrieval for long documents

Oussama Ayoub (Research Center, Léonard de Vinci Pôle Universitaire, Paris, France and Seville More Helory, Courbevoie, France)
Christophe Rodrigues (Research Center, Léonard de Vinci Pôle Universitaire, Paris, France)
Nicolas Travers (Research Center, Léonard de Vinci Pôle Universitaire, Paris, France)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 8 September 2023

Issue publication date: 28 November 2023

39

Abstract

Purpose

This paper aims to manage the word gap in information retrieval (IR) especially for long documents belonging to specific domains. In fact, with the continuous growth of text data that modern IR systems have to manage, existing solutions are needed to efficiently find the best set of documents for a given request. The words used to describe a query can differ from those used in related documents. Despite meaning closeness, nonoverlapping words are challenging for IR systems. This word gap becomes significant for long documents from specific domains.

Design/methodology/approach

To generate new words for a document, a deep learning (DL) masked language model is used to infer related words. Used DL models are pretrained on massive text data and carry common or specific domain knowledge to propose a better document representation.

Findings

The authors evaluate the approach of this study on specific IR domains with long documents to show the genericity of the proposed model and achieve encouraging results.

Originality/value

In this paper, to the best of the authors’ knowledge, an original unsupervised and modular IR system based on recent DL methods is introduced.

Keywords

Acknowledgements

This work has been founded by Seville More Hélory, Chaire LegalCluster at ESILV.

Since acceptance of this article, the following authors have updated their affiliations: Christophe Rodrigues and Nicolas Travers are at the Léonard de Vinci Pôle Universitaire, Research Center, Paris La Défense, France.

Citation

Ayoub, O., Rodrigues, C. and Travers, N. (2023), "LoGE: an unsupervised local-global document extension generation in information retrieval for long documents", International Journal of Web Information Systems, Vol. 19 No. 5/6, pp. 244-262. https://doi.org/10.1108/IJWIS-07-2023-0109

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles