Search results

1 – 2 of 2
Article
Publication date: 29 April 2014

Mohammad Amin Shayegan and Saeed Aghabozorgi

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory…

Abstract

Purpose

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory requirement for saving and processing data, and the time complexity for training algorithms. The purpose of the paper is to reduce the volume of training part of a data set – in order to increase the system speed, without any significant decrease in system accuracy.

Design/methodology/approach

A new technique for data set size reduction – using a version of modified frequency diagram approach – is presented. In order to reduce processing time, the proposed method compares the samples of a class to other samples in the same class, instead of comparing samples from different classes. It only removes patterns that are similar to the generated class template in each class. To achieve this aim, no feature extraction operation was carried out, in order to produce more precise assessment on the proposed data size reduction technique.

Findings

The results from the experiments, and according to one of the biggest handwritten numeral standard optical character recognition (OCR) data sets, Hoda, show a 14.88 percent decrease in data set volume without significant decrease in performance.

Practical implications

The proposed technique is effective for size reduction for all pictorial databases such as OCR data sets.

Originality/value

State-of-the-art algorithms currently used for data set size reduction usually remove samples near to class's centers, or support vector (SV) samples between different classes. However, the samples near to a class center have valuable information about class characteristics, and they are necessary to build a system model. Also, SV s are important samples to evaluate the system efficiency. The proposed technique, unlike the other available methods, keeps both outlier samples, as well as the samples close to the class centers.

Open Access
Article
Publication date: 28 November 2017

Mansoor Alghamdi and William Teahan

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future…

6809

Abstract

Purpose

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches.

Design/methodology/approach

This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes.

Findings

The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script.

Originality/value

To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Details

PSU Research Review, vol. 1 no. 3
Type: Research Article
ISSN: 2399-1747

Keywords

1 – 2 of 2