To read this content please select one of the options below:

Optimisation of archival processes involving digitisation of typewritten documents

Hrvoje Stančić (Department of Information and Communication Sciences, Faculty of Humanities and Social Sciences, Zagreb, Croatia)
Željko Trbušić (Division for the History of Croatian Literature, Institute for the History of Croatian Literature, Theater and Music, Croatian Academy of Sciences and Arts, Zagreb, Croatia)

Aslib Journal of Information Management

ISSN: 2050-3806

Article publication date: 17 July 2020

Issue publication date: 12 November 2020




The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.


The typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.


A series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.


Based on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities.



Stančić, H. and Trbušić, Ž. (2020), "Optimisation of archival processes involving digitisation of typewritten documents", Aslib Journal of Information Management, Vol. 72 No. 4, pp. 545-559.



Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles