Optimisation of archival processes involving digitisation of typewritten documents
Aslib Journal of Information Management
ISSN: 2050-3806
Article publication date: 17 July 2020
Issue publication date: 12 November 2020
Abstract
Purpose
The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.
Design/methodology/approach
The typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.
Findings
A series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.
Originality/value
Based on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities.
Keywords
Citation
Stančić, H. and Trbušić, Ž. (2020), "Optimisation of archival processes involving digitisation of typewritten documents", Aslib Journal of Information Management, Vol. 72 No. 4, pp. 545-559. https://doi.org/10.1108/AJIM-11-2019-0326
Publisher
:Emerald Publishing Limited
Copyright © 2020, Emerald Publishing Limited