To read this content please select one of the options below:

Integrated use of KOS and deep learning for data set annotation in tourism domain

Giovanna Aracri (Institute for Informatics and Telematics of National Research Council (IIT-CNR), Rende, Italy)
Antonietta Folino (Department of Culture, Education and Society, University of Calabria, Rende, Italy)
Stefano Silvestri (Institute for High Performance Computing and Networking of National Research Council (ICAR-CNR), Naples, Italy)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 2 May 2023

Issue publication date: 24 October 2023

89

Abstract

Purpose

The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.

Design/methodology/approach

A method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.

Findings

The study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.

Originality/value

The paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.

Keywords

Acknowledgements

Authors have equally contributed to this work, however Giovanna Aracri particularly focused on “Introduction”, “Related Works” and “Conclusions and Future Works”; Antonietta Folino focused on “Motivations” and “Thesaurus construction”; Stefano Silvestri focused on “Iterative NER Corpus Annotation” “Experiments” and “Results and Discussion”.

Funding: The authors would like to acknowledge the financial support provided by POR Campania FESR 2014/2020, project STOP – “a Smart TOurism Platform”, Asse e Obiettivo: Asse 1 Ricerca e Innovazione - Obiettivo Specifico 1.1 Incremento dell’attività di innovazione delle imprese.

Citation

Aracri, G., Folino, A. and Silvestri, S. (2023), "Integrated use of KOS and deep learning for data set annotation in tourism domain", Journal of Documentation, Vol. 79 No. 6, pp. 1440-1458. https://doi.org/10.1108/JD-02-2023-0019

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles