This study aims to provide a system capable of static searching on a large number of unstructured texts directly on the Web domain while keeping costs to a minimum. The proposed framework is applied to the unstructured texts of Migne’s Patrologia Graeca (PG) collection, setting PG as an implementation example of the method.
The unstructured texts of PG have automatically transformed to a read-only not only Structured Query Language (NoSQL) database with a structure identical to that of a representational state transfer access point interface. The transformation makes it possible to execute queries and retrieve ranked results based on a specialized application of the extended Boolean model.
Using a specifically built Web-browser-based search tool, the user can quickly locate ranked relevant fragments of texts with the ability to navigate back and forth. The user can search using the initial part of words and by ignoring the diacritics of the Greek language. The performance of the search system is comparatively examined when different versions of hypertext transfer protocol (Http) are used for various network latencies and different modes of network connections. Queries using Http-2 have by far the best performance, compared to any of Http-1.1 modes.
The system is not limited to the case study of PG and has a generic application in the field of humanities. The expandability of the system in terms of semantic enrichment is feasible by taking into account synonyms and topics if they are available. The system’s main advantage is that it is totally static which implies important features such as simplicity, efficiency, fast response, portability, security and scalability.
Varthis, E., Poulos, M., Giarenis, I. and Papavlasopoulos, S. (2021), "A novel framework for delivering static search capabilities to large textual corpora directly on the Web domain: an implementation for Migne’s Patrologia Graeca", International Journal of Web Information Systems, Vol. 17 No. 3, pp. 153-186. https://doi.org/10.1108/IJWIS-10-2020-0062
Emerald Publishing Limited
Copyright © 2021, Emerald Publishing Limited