To read the full version of this content please select one of the options below:

A novel framework for delivering static search capabilities to large textual corpora directly on the Web domain: an implementation for Migne’s Patrologia Graeca

Evagelos Varthis (Department of Archives, Library Science and Museology, Ionian University, Corfu, Greece)
Marios Poulos (Department of Archives, Library Science and Museology, Ionian University, Corfu, Greece)
Ilias Giarenis (Department of History, Ionian University, Corfu, Greece)
Sozon Papavlasopoulos (Department of Archives, Library Science and Museology, Ionian University, Corfu, Greece)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 19 May 2021

Issue publication date: 27 July 2021

40

Abstract

Purpose

This study aims to provide a system capable of static searching on a large number of unstructured texts directly on the Web domain while keeping costs to a minimum. The proposed framework is applied to the unstructured texts of Migne’s Patrologia Graeca (PG) collection, setting PG as an implementation example of the method.

Design/methodology/approach

The unstructured texts of PG have automatically transformed to a read-only not only Structured Query Language (NoSQL) database with a structure identical to that of a representational state transfer access point interface. The transformation makes it possible to execute queries and retrieve ranked results based on a specialized application of the extended Boolean model.

Findings

Using a specifically built Web-browser-based search tool, the user can quickly locate ranked relevant fragments of texts with the ability to navigate back and forth. The user can search using the initial part of words and by ignoring the diacritics of the Greek language. The performance of the search system is comparatively examined when different versions of hypertext transfer protocol (Http) are used for various network latencies and different modes of network connections. Queries using Http-2 have by far the best performance, compared to any of Http-1.1 modes.

Originality/value

The system is not limited to the case study of PG and has a generic application in the field of humanities. The expandability of the system in terms of semantic enrichment is feasible by taking into account synonyms and topics if they are available. The system’s main advantage is that it is totally static which implies important features such as simplicity, efficiency, fast response, portability, security and scalability.

Keywords

Citation

Varthis, E., Poulos, M., Giarenis, I. and Papavlasopoulos, S. (2021), "A novel framework for delivering static search capabilities to large textual corpora directly on the Web domain: an implementation for Migne’s Patrologia Graeca", International Journal of Web Information Systems, Vol. 17 No. 3, pp. 153-186. https://doi.org/10.1108/IJWIS-10-2020-0062

Publisher

:

Emerald Publishing Limited

Copyright © 2021, Emerald Publishing Limited

Related articles