To read this content please select one of the options below:

An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images

Hadi Grailu (Engineering Department, Tarbiat Modares University, Tehran, Iran)
Mojtaba Lotfizad (Engineering Department, Tarbiat Modares University, Tehran, Iran)
Hadi Sadoghi‐Yazdi (Engineering Department, Tarbiat Moallem University of Sabzevar, Sabzevar, Iran)

International Journal of Intelligent Computing and Cybernetics

ISSN: 1756-378X

Article publication date: 27 March 2009

482

Abstract

Purpose

The purpose of this paper is to propose a lossy/lossless binary textual image compression method based on an improved pattern matching (PM) technique.

Design/methodology/approach

In the Farsi/Arabic script, contrary to the printed Latin script, letters usually attach together and produce various patterns. Hence, some patterns are fully or partially subsets of some others. Two new ideas are proposed here. First, the number of library prototypes is reduced by detecting and then removing the fully or partially similar prototypes. Second, a new effective pattern encoding scheme is proposed for all types of patterns including text and graphics. The new encoding scheme has two operation modes of chain coding and soft PM, depending on the ratio of the pattern area to its chain code effective length. In order to encode the number sequences, the authors have modified the multi‐symbol QM‐coder. The proposed method has three levels for the lossy compression. Each level, in its turn, further increases the compression ratio. The first level includes applying some processing in the chain code domain such as omission of small patterns and holes, omission of inner holes of characters, and smoothing the boundaries of the patterns. The second level includes the selective pixel reversal technique, and the third level includes using the proposed method of prioritizing the residual patterns for encoding, with respect to their degree of compactness.

Findings

Experimental results show that the compression performance of the proposed method is considerably better than that of the best existing binary textual image compression methods as high as 1.6‐3 times in the lossy case and 1.3‐2.4 times in the lossless case at 300 dpi. The maximum compression ratios are achieved for Farsi and Arabic textual images.

Research limitations/implications

Only the binary printed typeset textual images are considered.

Practical implications

The proposed method has a high‐compression ratio for archiving and storage applications.

Originality/value

To the authors' best knowledge, the existing textual image compression methods or standards have not so far exploited the property of full or partial similarity of prototypes for increasing the compression ratio for any scripts. Also, the idea of combining the boundary description methods with the run‐length and arithmetic coding techniques has not so far been used.

Keywords

Citation

Grailu, H., Lotfizad, M. and Sadoghi‐Yazdi, H. (2009), "An improved pattern matching technique for lossy/lossless compression of binary printed Farsi and Arabic textual images", International Journal of Intelligent Computing and Cybernetics, Vol. 2 No. 1, pp. 120-147. https://doi.org/10.1108/17563780910939273

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited

Related articles