Books and journals Case studies Expert Briefings Open Access
Advanced search

An extensive study of authorship authentication of Arabic articles

Mahmoud Al-Ayyoub (Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan)
Ahmed Alwajeeh (Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan)
Ismail Hmeidi (Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan)

International Journal of Web Information Systems

ISSN: 1744-0084

Publication date: 18 April 2017

Abstract

Purpose

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance.

Design/methodology/approach

Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings.

Findings

The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings.

Practical implications

Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis.

Originality/value

This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text.

Keywords

  • Arabic text processing
  • Authorship authentication
  • Bag-of-words
  • Stylometric features

Citation

Al-Ayyoub, M., Alwajeeh, A. and Hmeidi, I. (2017), "An extensive study of authorship authentication of Arabic articles", International Journal of Web Information Systems, Vol. 13 No. 1, pp. 85-104. https://doi.org/10.1108/IJWIS-03-2016-0011

Download as .RIS

Publisher

:

Emerald Publishing Limited

Copyright © 2017, Emerald Publishing Limited

Please note you do not have access to teaching notes

You may be able to access teaching notes by logging in via Shibboleth, Open Athens or with your Emerald account.
Login
If you think you should have access to this content, click the button to contact our support team.
Contact us

To read the full version of this content please select one of the options below

You may be able to access this content by logging in via Shibboleth, Open Athens or with your Emerald account.
Login
To rent this content from Deepdyve, please click the button.
Rent from Deepdyve
If you think you should have access to this content, click the button to contact our support team.
Contact us
Emerald Publishing
  • Opens in new window
  • Opens in new window
  • Opens in new window
  • Opens in new window
© 2021 Emerald Publishing Limited

Services

  • Authors Opens in new window
  • Editors Opens in new window
  • Librarians Opens in new window
  • Researchers Opens in new window
  • Reviewers Opens in new window

About

  • About Emerald Opens in new window
  • Working for Emerald Opens in new window
  • Contact us Opens in new window
  • Publication sitemap

Policies and information

  • Privacy notice
  • Site policies
  • Modern Slavery Act Opens in new window
  • Chair of Trustees governance statement Opens in new window
  • COVID-19 policy Opens in new window
Manage cookies

We’re listening — tell us what you think

  • Something didn’t work…

    Report bugs here

  • All feedback is valuable

    Please share your general feedback

  • Member of Emerald Engage?

    You can join in the discussion by joining the community or logging in here.
    You can also find out more about Emerald Engage.

Join us on our journey

  • Platform update page

    Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

  • Questions & More Information

    Answers to the most commonly asked questions here