To read this content please select one of the options below:

Farsi lexical analysis and stop word list

M.R. Davarpanah (Faculty of Education and Psychology, Ferdowsi University of Mashhad, Mashhad, Iran)
M. Sanji (Imam Reza University, Mashhad, Iran)
M. Aramideh (Mashhad Education Organization, Mashhad, Iran)

Library Hi Tech

ISSN: 0737-8831

Article publication date: 4 September 2009

840

Abstract

Purpose

The purpose of this article is to present an aggregated methodology for construction of the stop word list in Farsi language and generate a generic Farsi stop word list.

Design/methodology/approach

The stop word list is extracted based on: syntactic classes, domain dependent, corpus statistic and expert judgments. Some of the main challenges that arise in the Farsi automatic text processing are outlined as well.

Findings

Results from the techniques are aggregated and a general Farsi stop word list containing 927 words is generated.

Practical implications

The created stop word list can affect the efficiency and effectiveness of retrieval and indexing process in Farsi information retrieval system, moreover, it can play an important role during Farsi text segmentation.

Originality/value

Our stop word extraction algorithm is a promising technique; it could be applied into other languages that they have ambiguities in automatic text segmentation.

Keywords

Citation

Davarpanah, M.R., Sanji, M. and Aramideh, M. (2009), "Farsi lexical analysis and stop word list", Library Hi Tech, Vol. 27 No. 3, pp. 435-449. https://doi.org/10.1108/07378830910988559

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited

Related articles