Search results

1 – 10 of 45
Open Access
Article
Publication date: 28 November 2017

Mansoor Alghamdi and William Teahan

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future…

6581

Abstract

Purpose

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches.

Design/methodology/approach

This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes.

Findings

The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script.

Originality/value

To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Details

PSU Research Review, vol. 1 no. 3
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 29 April 2014

Mohammad Amin Shayegan and Saeed Aghabozorgi

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory…

Abstract

Purpose

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory requirement for saving and processing data, and the time complexity for training algorithms. The purpose of the paper is to reduce the volume of training part of a data set – in order to increase the system speed, without any significant decrease in system accuracy.

Design/methodology/approach

A new technique for data set size reduction – using a version of modified frequency diagram approach – is presented. In order to reduce processing time, the proposed method compares the samples of a class to other samples in the same class, instead of comparing samples from different classes. It only removes patterns that are similar to the generated class template in each class. To achieve this aim, no feature extraction operation was carried out, in order to produce more precise assessment on the proposed data size reduction technique.

Findings

The results from the experiments, and according to one of the biggest handwritten numeral standard optical character recognition (OCR) data sets, Hoda, show a 14.88 percent decrease in data set volume without significant decrease in performance.

Practical implications

The proposed technique is effective for size reduction for all pictorial databases such as OCR data sets.

Originality/value

State-of-the-art algorithms currently used for data set size reduction usually remove samples near to class's centers, or support vector (SV) samples between different classes. However, the samples near to a class center have valuable information about class characteristics, and they are necessary to build a system model. Also, SV s are important samples to evaluate the system efficiency. The proposed technique, unlike the other available methods, keeps both outlier samples, as well as the samples close to the class centers.

Open Access
Article
Publication date: 28 November 2017

Mohammad Nurunnabi

410

Abstract

Details

PSU Research Review, vol. 1 no. 3
Type: Research Article
ISSN: 2399-1747

Article
Publication date: 10 April 2023

Evagelos Varthis and Marios Poulos

This study aims to present metaGraphos, a crowdsourcing system that aids in the transcription and semantic enhancement of scanned documents by using a pool of volunteers or people…

Abstract

Purpose

This study aims to present metaGraphos, a crowdsourcing system that aids in the transcription and semantic enhancement of scanned documents by using a pool of volunteers or people willing to participate in exchange for a financial reward.

Design/methodology/approach

The metaGraphos can be used in circumstances where optical character recognition fails to produce satisfactory results, semantic tagging or assigning thematic headings to texts is considered necessary or even when ground-truth data has to be collected in raw form.

Findings

The system automatically provides a Web-based interface comprising a static HTML page and JavaScript code that displays the scanned images of the document, coupled with the corresponding incomplete texts side by side, allowing users to correct or complete the texts in parallel.

Social implications

By assisting the parallel transcription and the semantic enhancement of difficult scanned documents, the system further reveals the hidden cultural wealth and aids in knowledge dissemination, a fact that contributes significantly to the academic-scientific dialog and feedback.

Originality/value

Individual researchers, libraries and organizations in general may benefit from the system because it is cost-effective, practical and simple to set up client–server architecture that provides a reliable way to transcribe texts or revise transcriptions on a large scale.

Details

Collection and Curation, vol. 42 no. 4
Type: Research Article
ISSN: 2514-9326

Keywords

Content available
Article
Publication date: 1 June 2005

139

Abstract

Details

Library Hi Tech News, vol. 22 no. 5
Type: Research Article
ISSN: 0741-9058

Article
Publication date: 31 July 2020

Zainab Akhtar, Jong Weon Lee, Muhammad Attique Khan, Muhammad Sharif, Sajid Ali Khan and Naveed Riaz

In artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed…

Abstract

Purpose

In artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed documents into machine-readable text document. The major purpose of OCR in academia and banks is to achieve a significant performance to save storage space.

Design/methodology/approach

A novel technique is proposed for automated OCR based on multi-properties features fusion and selection. The features are fused using serially formulation and output passed to partial least square (PLS) based selection method. The selection is done based on the entropy fitness function. The final features are classified by an ensemble classifier.

Findings

The presented method was extensively tested on two datasets such as the authors proposed and Chars74k benchmark and achieved an accuracy of 91.2 and 99.9%. Comparing the results with existing techniques, it is found that the proposed method gives improved performance.

Originality/value

The technique presented in this work will help for license plate recognition and text conversion from a printed document to machine-readable.

Details

Journal of Enterprise Information Management, vol. 36 no. 3
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 2 February 2018

Amgad Badewi, Essam Shehab, Jing Zeng and Mostafa Mohamad

The purpose of this paper is to answer two research questions: what are the ERP resources and organizational complementary resources (OCRs) required to achieve each group of…

4180

Abstract

Purpose

The purpose of this paper is to answer two research questions: what are the ERP resources and organizational complementary resources (OCRs) required to achieve each group of benefits? And on the basis of its resources, when should an organization invest more in ERP resources and/or OCRs so that the potential value of its ERP is realised?

Design/methodology/approach

Studying 12 organizations in different countries and validating the results with 8 consultants.

Findings

ERP benefits realization capability framework is developed; it shows that each group of benefits requires ERP resources (classified into features, attached technologies and information technology department competences) and OCRs (classified into practices, attitudes, culture, skills and organizational characteristics) and that leaping ahead to gain innovation benefits before being mature enough in realising a firm’s planning and automation capabilities could be a waste of time and effort.

Research limitations/implications

It is qualitative study. It needs to be backed by quantitative studies to test the results.

Practical implications

Although the “P” in ERP stands for planning, many academics and practitioners still believe that ERP applies to automation only. This research spotlights that the ability to invest in ERP can increase the innovation and planning capabilities of the organization only if it is extended and grown at the right time and if it is supported by OCRs. It is not cost effective to push an organization to achieve all the benefits at the same time; rather, it is clear that an organization would not be able to enjoy a higher level of benefits until it achieves a significant number of lower-level benefits. Thus, investing in higher-level benefit assets directly after an ERP implementation, when there are no organizational capabilities available to use these assets, could be inefficient. Moreover, it could be stressful to users when they see plenty of new ERP resources without the ability to use them. Although it could be of slight benefit to introduce, for example, business intelligence to employees in the “stabilizing period” (Badewi et al., 2013), from the financial perspective, it is a waste of money since the benefits would not be realised as expected. Therefore, orchestrating ERP assets with the development of organizational capabilities is important for achieving the greatest effectiveness and efficiency of the resources available to the organization. This research can be used as a benchmark for designing the various blueprints required to achieve different groups of benefits from ERP investments.

Originality/value

This research addresses two novel questions: RQ1: what are the ERP resources and OCRs required to achieve the different kinds of ERP benefits? RQ2: when, and on what basis, should an organization deploy more resources to leverage the ERP business value?

Details

Business Process Management Journal, vol. 24 no. 1
Type: Research Article
ISSN: 1463-7154

Keywords

Article
Publication date: 16 August 2021

Evagelos Varthis, Spyros Tzanavaris, Ilias Giarenis, Sozon Papavlasopoulos, Manolis Drakakis and Marios Poulos

This paper aims to present a methodology for the semantic enrichment on the scanned collection of Migne’s Patrologia Graeca (PG), attempting to easily locate on the Web domain the…

Abstract

Purpose

This paper aims to present a methodology for the semantic enrichment on the scanned collection of Migne’s Patrologia Graeca (PG), attempting to easily locate on the Web domain the scanned PG source, when a reference of this source is described and commented on another scanned or textual document, and to semantically enrich PG through related scanned or textual documents named “satellite texts” published by third people. The present enrichment of PG uses as satellite texts the Dorotheos Scholarios's Synoptic Index (DSSI) which act as metadata for PG.

Design/methodology/approach

The methodology consists of two parts. The first part addresses the DSSI transcription via a proper web tool. The second part is divided into two subsections: the accomplishment of interlinking the printed column numbers of each scanned PG page with its actual filename, which is the build of a matching function, and the build of a web interface for PG, based on the generated Uniform Resource Identifiers (URIs) of the above first subsection.

Findings

The result of the implemented methodology is a Web portal, capable of providing server-less search of topics with direct (single click) navigation to sources. The produced system is static, scalable, easy to be managed and requires minimal cost to be completed and maintained. The produced data sets of transcribed DSSI and the JavaScript Object Notation (JSON) matching functions are available for personal use of students and scholars under Creative Commons license (CC-BY-NC-SA).

Social implications

Scholars or anyone interested in a particular subject can easily locate topics in PG and reference them, using URIs that are easy to remember. This fact contributes significantly to the related scientific dialogue.

Originality/value

The methodology uses the transcribed satellite texts of DSSI, which act as metadata for PG, to semantically enrich PG collection. Furthermore, the built PG Web interface can be used by other satellite texts as a reference basis to further enrich PG, as it provides a direct identification of sources. The presented methodology is general and can be applied to any scanned collection using its own satellite texts.

Details

Information Discovery and Delivery, vol. 50 no. 2
Type: Research Article
ISSN: 2398-6247

Keywords

Open Access
Article
Publication date: 26 September 2023

Mayada Aref

The diffusion of electronic commerce has a notable impact on the economy's prosperity. This paper embraces complexity theory principles to examine the factors affecting Internet…

1133

Abstract

Purpose

The diffusion of electronic commerce has a notable impact on the economy's prosperity. This paper embraces complexity theory principles to examine the factors affecting Internet users' acceptance and use of electronic retailers. It is essential for the sustainability of electronic retailers to understand the motivations impacting online consumer behaviour. Symmetrical and asymmetrical methods are combined to examine the relationship between perceived ease of use, perceived enjoyment, web characteristics, online consumer reviews (OCRs) and online purchase intention. Further, symmetry and differences between males and females were examined.

Design/methodology/approach

Data collected from 425 online consumers using an online structured survey was analysed using structural equation modelling (SEM) and fuzzy set qualitative comparative analysis (fsQCA). The net effects and causal configurations of the four proposed variables and online purchase intention were examined.

Findings

The SEM findings confirmed the significance of perceived enjoyment, website characteristics and OCRs on online purchase intention. Perceived enjoyment mediated the relationship between perceived ease of use and online purchase intention. The multi-group analysis confirmed the difference in antecedent impacts between males and females. The fsQCA findings revealed that multiple recipes lead to the occurrence of online purchase intention; in addition, the recipes leading to its absence do not mirror the previous ones.

Originality/value

The present study embraces complexity theory concepts in understanding online purchase intention using fsQCA methodology; further, the role of gender in online consumer behaviour was highlighted in the result discussion.

Details

Journal of Internet and Digital Economics, vol. 3 no. 1/2
Type: Research Article
ISSN: 2752-6356

Keywords

Content available
Article
Publication date: 2 May 2008

Roderic Vassie

215

Abstract

Details

Library Hi Tech News, vol. 25 no. 4
Type: Research Article
ISSN: 0741-9058

1 – 10 of 45