Search results

1 – 1 of 1
Article
Publication date: 17 July 2020

Hrvoje Stančić and Željko Trbušić

The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.

Abstract

Purpose

The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.

Design/methodology/approach

The typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.

Findings

A series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.

Originality/value

Based on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities.

Details

Aslib Journal of Information Management, vol. 72 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 1 of 1