To read this content please select one of the options below:

Stylometric analysis of classical Arabic texts for genre detection

Maha Al-Yahya (Department of Information Technology, King Saud University, Riyadh, Saudi Arabia)

The Electronic Library

ISSN: 0264-0473

Article publication date: 4 October 2018

Issue publication date: 5 November 2018

392

Abstract

Purpose

In the context of information retrieval, text genre is as important as its content, and knowledge of the text genre enhances the search engine features by providing customized retrieval. The purpose of this study is to explore and evaluate the use of stylometric analysis, a quantitative analysis for the linguistics features of text, to support the task of automated text genre detection for Classical Arabic text.

Design/methodology/approach

Unsupervised clustering and supervised classification were applied on the King Saud University Corpus of Classical Arabic texts (KSUCCA) using the most frequent words in the corpus (MFWs) as stylometric features. Four popular distance measures established in stylometric research are evaluated for the genre detection task.

Findings

The results of the experiments show that stylometry-based genre clustering and classification align well with human-defined genre. The evidence suggests that genre style signals exist for Classical Arabic and can be used to support the task of automated genre detection.

Originality/value

This work targets the task of genre detection in Classical Arabic text using stylometric features, an approach that has only been previously applied to Arabic authorship attribution. The study also provides a comparison of four distance measures used in stylomtreic analysis on the KSUCCA, a corpus with over 50 million words of Classical Arabic using clustering and classification.

Keywords

Citation

Al-Yahya, M. (2018), "Stylometric analysis of classical Arabic texts for genre detection", The Electronic Library, Vol. 36 No. 5, pp. 842-855. https://doi.org/10.1108/EL-11-2017-0236

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Related articles