To read this content please select one of the options below:

Arabic script language identification using letter frequency neural networks

Ali Selamat (Intelligent Soft System Laboratory, Universiti Teknologi Malaysia, Skudai, Malaysia)
Choon‐Ching Ng (Intelligent Soft System Laboratory, Universiti Teknologi Malaysia, Skudai, Malaysia)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 21 November 2008

517

Abstract

Purpose

With the rapid emergence and explosion of the internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online from the world wide web. Efficiently and effectively managing these documents written in different languages is important to organizations and individuals. Therefore, the purpose of this paper is to propose letter frequency neural networks to enhance the performance of language identification.

Design/methodology/approach

Initially, the paper analyzes the feasibility of using a windowing algorithm in order to find the best method in selecting the features of Arabic script documents language identification using backpropagation neural networks. Previously, it had been found that the sliding window and non‐sliding window algorithm used as feature selection methods in the experiments did not yield a good result. Therefore, this paper proposes, a language identification of Arabic script documents based on letter frequency using a backpropagation neural network and used the datasets belonging to Arabic, Persian, Urdu and Pashto language documents which are all Arabic script languages.

Findings

The experiments have shown that the average root mean squared error of Arabic script document language identification based on letter frequency feature selection algorithm is lower than the windowing algorithm.

Originality/value

This paper highlights the fact that using neural networks with proper feature selection methods will increase the performance of language identification.

Keywords

Citation

Selamat, A. and Ng, C. (2008), "Arabic script language identification using letter frequency neural networks", International Journal of Web Information Systems, Vol. 4 No. 4, pp. 484-500. https://doi.org/10.1108/17440080810919503

Publisher

:

Emerald Group Publishing Limited

Copyright © 2008, Emerald Group Publishing Limited

Related articles