Search results

1 – 10 of 503
Open Access
Article
Publication date: 28 November 2017

Mansoor Alghamdi and William Teahan

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future…

6583

Abstract

Purpose

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches.

Design/methodology/approach

This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes.

Findings

The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script.

Originality/value

To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Details

PSU Research Review, vol. 1 no. 3
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 3 June 2014

Jim Hahn

The purpose of this paper is to report results of a formative usability study that investigated first-year student use of an optical character recognition (OCR) mobile application…

1078

Abstract

Purpose

The purpose of this paper is to report results of a formative usability study that investigated first-year student use of an optical character recognition (OCR) mobile application (app) designed to help students find resources for course assignments. The app uses textual content from the assignment sheet to suggest relevant library resources of which students may not be aware.

Design/methodology/approach

Formative evaluation data are collected to inform the production level version of the mobile application and to understand student use models and requirements for OCR software in mobile applications.

Findings

Mobile OCR apps are helpful for undergraduate students searching known titles of books, general subject areas or searching for help guide content developed by the library. The results section details how student feedback shaped the next iteration of the app for integration as a Minrva module.

Research limitations/implications

This usability paper is not a large-scale quantitative study, but seeks to provide deep qualitative research data for the specific mobile interface studied, the Text-shot prototype.

Practical implications

The OCR application is designed to help students learn about availability of library resources based on scanning (e.g. taking a picture, or “Text-shot”) of an assignment sheet, a course syllabus or other course-related handouts.

Originality/value

This study contributes a new area of application development for libraries, with research methods that are useful for other mobile development studies.

Details

Reference Services Review, vol. 42 no. 2
Type: Research Article
ISSN: 0090-7324

Keywords

Article
Publication date: 17 July 2020

Hrvoje Stančić and Željko Trbušić

The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.

Abstract

Purpose

The authors investigate optical character recognition (OCR) technology and discuss its implementation in the context of digitisation of archival materials.

Design/methodology/approach

The typewritten transcripts of the Croatian Writers' Society from the mid-60s of the 20th century are used as the test data. The optimal digitisation setup is investigated in order to obtain the best OCR results. This was done by using the sample of 123 pages digitised at different resolution settings and binarisation levels.

Findings

A series of tests showed that different settings produce significantly different results. The best OCR accuracy achieved at the test sample of the typewritten documents was 95.02%. The results show that the resolution is significantly more important than binarisation pre-processing procedure for achieving better OCR results.

Originality/value

Based on the research results, the authors give recommendations for achieving optimal digitisation process setup with the aim of increasing the quality of OCR results. Finally, the authors put the research results in the context of digitisation of cultural heritage in general and discuss further investigation possibilities.

Details

Aslib Journal of Information Management, vol. 72 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 31 August 2012

Tobias Blanke, Michael Bryant and Mark Hedges

This paper aims to present an evaluation of open source OCR for supporting research on material in small‐ to medium‐scale historical archives.

1921

Abstract

Purpose

This paper aims to present an evaluation of open source OCR for supporting research on material in small‐ to medium‐scale historical archives.

Design/methodology/approach

The approach was to develop a workflow engine to support the easy customisation of the OCR process towards the historical materials using open source technologies. Commercial OCR often fails to deliver sufficient results here, as their processing is optimised towards large‐scale commercially relevant collections. The approach presented here allows users to combine the most effective parts of different OCR tools.

Findings

The authors demonstrate their application and its flexibility and present two case studies, which demonstrate how OCR can be embedded into wider digitally enabled historical research. The first case study produces high‐quality research‐oriented digitisation outputs, utilizing services that the authors developed to allow for the direct linkage of digitisation image and OCR output. The second case study demonstrates what becomes possible if OCR can be customised directly within a larger research infrastructure for history. In such a scenario, further semantics can be added easily to the workflow, enhancing the research browse experience significantly.

Originality/value

There has been little work on the use of open source OCR technologies for historical research. This paper demonstrates that the authors' workflow approach allows users to combine commercial engines' ability to read a wider range of character sets with the flexibility of open source tools in terms of customisable pre‐processing and layout analysis. All this can be done without the need to develop dedicated code.

Details

Journal of Documentation, vol. 68 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 March 1971

Andrew Robertson

The idea of optical character recognition (OCR), in other words the “reading” of documents by other than human means, arose as a practical proposition during the Second World War…

Abstract

The idea of optical character recognition (OCR), in other words the “reading” of documents by other than human means, arose as a practical proposition during the Second World War. Wartime experience of using computers in the United States had revealed the contrasts in speeds between the transcription of documents to be processed (at that time the punching of cards or tape by operatives working from original documents) and the central processing within the computer itself. Visual output was also slower than central processing but was much speeded up by the introduction of line printers and later of xerography. This “paired” case study, part of a project sponsored by the Science Research Council to examine patterns of success and failure in industrial innovation, is confined to two attempts to innovate in the field of OCR. There were others, one or two of which were contemporary, most of which have followed, have a much more recent history and may be thought to have overtaken, in terms of market penetration, the innovation here designated a commercial success. The point of this study when it was undertaken was to extract data about the two innovations that would be suitable for general analysis by a computer programme designed to search out significant groups of explanatory factors so that the characteristics associated with innovative success might be recognised as typical within an industry, or perhaps generally. This study belongs to one of two groups, the instrument industry, the other group investigated being chemical manufacturing.

Details

Management Decision, vol. 9 no. 3
Type: Research Article
ISSN: 0025-1747

Article
Publication date: 31 July 2020

Zainab Akhtar, Jong Weon Lee, Muhammad Attique Khan, Muhammad Sharif, Sajid Ali Khan and Naveed Riaz

In artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed…

Abstract

Purpose

In artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed documents into machine-readable text document. The major purpose of OCR in academia and banks is to achieve a significant performance to save storage space.

Design/methodology/approach

A novel technique is proposed for automated OCR based on multi-properties features fusion and selection. The features are fused using serially formulation and output passed to partial least square (PLS) based selection method. The selection is done based on the entropy fitness function. The final features are classified by an ensemble classifier.

Findings

The presented method was extensively tested on two datasets such as the authors proposed and Chars74k benchmark and achieved an accuracy of 91.2 and 99.9%. Comparing the results with existing techniques, it is found that the proposed method gives improved performance.

Originality/value

The technique presented in this work will help for license plate recognition and text conversion from a printed document to machine-readable.

Details

Journal of Enterprise Information Management, vol. 36 no. 3
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 1 January 1989

Clyde W. Grotophorst

Optical character recognition (OCR) technology can be employed to produce an ASCII‐text database for mounting on computer systems. Current technologies and principles of scanning…

Abstract

Optical character recognition (OCR) technology can be employed to produce an ASCII‐text database for mounting on computer systems. Current technologies and principles of scanning and OCR are discussed. A prototypical “local” project—the creation of a full‐text database of dissertations done at George Mason University—has been undertaken by the Fenwick Library at that institution. Problems encountered with current scanning and OCR technologies are illustrated and discussed, as well as techniques and “filter” programs developed to streamline the scanning and OCR conversion process.

Details

Library Hi Tech, vol. 7 no. 1
Type: Research Article
ISSN: 0737-8831

Article
Publication date: 16 October 2018

Rajeswari S. and Sai Baba Magapu

The purpose of this paper is to develop a text extraction tool for scanned documents that would extract text and build the keywords corpus and key phrases corpus for the document…

Abstract

Purpose

The purpose of this paper is to develop a text extraction tool for scanned documents that would extract text and build the keywords corpus and key phrases corpus for the document without manual intervention.

Design/methodology/approach

For text extraction from scanned documents, a Web-based optical character recognition (OCR) tool was developed. OCR is a well-established technology, so to develop the OCR, Microsoft Office document imaging tools were used. To account for the commonly encountered problem of skew being introduced, a method to detect and correct the skew introduced in the scanned documents was developed and integrated with the tool. The OCR tool was customized to build keywords and key phrases corpus for every document.

Findings

The developed tool was evaluated using a 100 document corpus to test the various properties of OCR. The tool had above 99 per cent word read accuracy for text only image documents. The customization of the OCR was tested with samples of Microfiches, sample of Journal pages from back volumes and samples from newspaper clips and the results are discussed in the summary. The tool was found to be useful for text extraction and processing.

Social implications

The scanned documents are converted to keywords and key phrases corpus. The tool could be used to build metadata for scanned documents without manual intervention.

Originality/value

The tool is used to convert unstructured data (in the form of image documents) to structured data (the document is converted into keywords, and key phrases database). In addition, the image document is converted to editable and searchable document.

Details

The Electronic Library, vol. 36 no. 5
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 January 1993

John Mackrory

Optical character recognition (OCR) is a vital tool for the food and pharmaceutical industries, allowing them to inspect for correct labelling and thereby conforming to good…

Abstract

Optical character recognition (OCR) is a vital tool for the food and pharmaceutical industries, allowing them to inspect for correct labelling and thereby conforming to good manufacturing practices (GMP).

Details

Sensor Review, vol. 13 no. 1
Type: Research Article
ISSN: 0260-2288

1 – 10 of 503