Books and journals Case studies Expert Briefings Open Access
Advanced search

Search results

1 – 10 of over 1000
To view the access options for this content please click here
Article
Publication date: 18 March 2019

Indexing Arabic texts using association rule data mining

Ramzi A. Haraty and Rouba Nasrallah

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by…

HTML
PDF (363 KB)

Abstract

Purpose

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules.

Design/methodology/approach

The proposed model uses an association rule algorithm for extracting frequent sets containing related items – to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together.

Findings

The proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works.

Research limitations/implications

The stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results.

Originality/value

In this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.

Details

Library Hi Tech, vol. 37 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/LHT-07-2017-0147
ISSN: 0737-8831

Keywords

  • Precision
  • Recall
  • Arabic text
  • Auto-indexing
  • Frequent sets
  • Rule-based data mining

To view the access options for this content please click here
Article
Publication date: 3 April 2020

Deep neural networks for Arabic information extraction

Abdelhalim Saadi and Hacene Belhadef

The purpose of this paper is to present a system based on deep neural networks to extract particular entities from natural language text, knowing that a massive amount of…

HTML
PDF (1.6 MB)

Abstract

Purpose

The purpose of this paper is to present a system based on deep neural networks to extract particular entities from natural language text, knowing that a massive amount of textual information is electronically available at present. Notably, a large amount of electronic text data indicates great difficulty in finding or extracting relevant information from them.

Design/methodology/approach

This study presents an original system to extract Arabic-named entities by combining a deep neural network-based part-of-speech tagger and a neural network-based named entity extractor. Firstly, the system extracts the grammatical classes of the words with high precision depending on the context of the word. This module plays the role of the disambiguation process. Then, a second module is used to extract the named entities.

Findings

Using deep neural networks in natural language processing, requires tuning many hyperparameters, which is a time-consuming process. To deal with this problem, applying statistical methods like the Taguchi method is much requested. In this study, the system is successfully applied to the Arabic-named entities recognition, where accuracy of 96.81 per cent was reported, which is better than the state-of-the-art results.

Research limitations/implications

The system is designed and trained for the Arabic language, but the architecture can be used for other languages.

Practical implications

Information extraction systems are developed for different applications, such as analysing newspaper articles and databases for commercial, political and social objectives. Information extraction systems also can be built over an information retrieval (IR) system. The IR system eliminates irrelevant documents and paragraphs.

Originality/value

The proposed system can be regarded as the first attempt to use double deep neural networks to increase the accuracy. It also can be built over an IR system. The IR system eliminates irrelevant documents and paragraphs. This process reduces the mass number of documents from which the authors wish to extract the relevant information using an information extraction system.

Details

Smart and Sustainable Built Environment, vol. 9 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/SASBE-03-2019-0031
ISSN: 2046-6099

Keywords

  • Deep neural networks
  • Part-of-speech tagging
  • Named entity recognition
  • Statistical machine translation
  • Natural language processing
  • Smart cities

To view the access options for this content please click here
Article
Publication date: 18 April 2017

An extensive study of authorship authentication of Arabic articles

Mahmoud Al-Ayyoub, Ahmed Alwajeeh and Ismail Hmeidi

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the…

HTML
PDF (641 KB)

Abstract

Purpose

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance.

Design/methodology/approach

Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings.

Findings

The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings.

Practical implications

Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis.

Originality/value

This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text.

Details

International Journal of Web Information Systems, vol. 13 no. 1
Type: Research Article
DOI: https://doi.org/10.1108/IJWIS-03-2016-0011
ISSN: 1744-0084

Keywords

  • Arabic text processing
  • Authorship authentication
  • Bag-of-words
  • Stylometric features

To view the access options for this content please click here
Article
Publication date: 1 December 1999

The implementation of the Arabic script in OLIB7 at the American University of Beirut Libraries

Katia Medawar

Developing software for processing bibliographic materials in the Arabic language is a relatively recent development. When libraries in parts of the Middle East, where…

HTML
PDF (397 KB)

Abstract

Developing software for processing bibliographic materials in the Arabic language is a relatively recent development. When libraries in parts of the Middle East, where Arabic is the main language, started automating their collections, most library systems did not provide for the use of Arabic script and this capability had to be developed. Automated library systems started to emerge (like Minisis, ALEPH, Dobis/Libis, TinLib, OLIB) to fill the gap for non‐Roman scripts. This article describes the stages the American University of Beirut Libraries went through in converting their Arabic materials for use in the OLIB7 library management system. A description of the background of the library is given along with the details of the romanisation process, the conversion process, the software and hardware chosen, the testing of the database, problems encountered, output and the handling of authority records.

Details

Program, vol. 33 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/EUM0000000006920
ISSN: 0033-0337

Keywords

  • Computer software
  • Foreign languages
  • Libraries

Content available
Article
Publication date: 28 November 2017

Experimental evaluation of Arabic OCR systems

Mansoor Alghamdi and William Teahan

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future…

Open Access
HTML
PDF (711 KB)

Abstract

Purpose

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches.

Design/methodology/approach

This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes.

Findings

The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script.

Originality/value

To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Details

PSU Research Review, vol. 1 no. 3
Type: Research Article
DOI: https://doi.org/10.1108/PRR-05-2017-0026
ISSN: 2399-1747

Keywords

  • Performance evaluation
  • Performance metrics
  • Arabic OCR
  • OCR

To view the access options for this content please click here
Article
Publication date: 1 April 1992

Arabic script on RLIN

Joan M. Aliprand

Arabic script is the most recent addition to the scripts available on the Research libraries Information Network (RLIN). Bibliographic control and retrieval using the…

HTML
PDF (1.8 MB)

Abstract

Arabic script is the most recent addition to the scripts available on the Research libraries Information Network (RLIN). Bibliographic control and retrieval using the authentic writing system are available for titles in Arabic, Persian (Farsi), Urdu, Ottoman Turkish, and other languages written with Arabic script. RLIN is the world's largest bibliographic database for Middle Eastern language material. This paper is a comprehensive description of the Arabic script features of RLIN. It covers Arabic character sets and RLIN's character repertoire for Arabic script; how Arabic characters are input and stored in the RLIN database; the equipment needed for Arabic script support; the indexing, retrieval, and presentation of records containing Arabic script; the inclusion of non‐Roman data in USMARC bibliographic records; and statistics on the RLIN databases. Sidebars explain features of Arabic writing. The discussion of data storage and presentation of text is relevant to any computer application that involves Arabic script.

Details

Library Hi Tech, vol. 10 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/eb047865
ISSN: 0737-8831

To view the access options for this content please click here
Article
Publication date: 28 September 2007

Mubser: a bilingual Braille to text translation with an Arabic interface

AbdulMalik Al‐Salman, Mohamed Alkanhal, Yousef AlOhali, Hazem Al‐Rashed and Bander Al‐Sulami

The purpose of this paper is to describe the development of a system called Mubser to translate Arabic and English Braille into normal text. The system can automatically…

HTML
PDF (350 KB)

Abstract

Purpose

The purpose of this paper is to describe the development of a system called Mubser to translate Arabic and English Braille into normal text. The system can automatically detect the source language and the Braille grade.

Design/methodology/approach

Mubser system was designed under the MS‐Windows environment and implemented using Visual C# 2.0 with an Arabic interface. The system uses the concept of rule file to translate supported languages from Braille to text. The rule file is based on XML format. The identification of the source language and grade is based on a statistical approach.

Findings

From the literature review, the authors found that most researches and products do not support bilingual translation from Braille to text in either contracted or un‐contracted Braille. Mubser system is a robust system that fills that gap. It helps both visually impaired and sighted people, especially Arabic native speakers, to translate from Braille to text.

Research limitations/implications

Mubser is being implemented and tested by the authors for both Arabic and English languages. The tests performed so far have shown excellent results. In the future, it is planned to integrate the system with an optical Braille recognition system, enhance the system to accept new languages, support maths and scientific symbols, and add spell checkers.

Practical implications

There is a desperate need for such system to translate Braille system into normal text. This system helps both sighted and blind people to communicate better.

Originality/value

This paper presents a novel system for converting Braille codes (Arabic and English) into normal text.

Details

International Journal of Web Information Systems, vol. 3 no. 3
Type: Research Article
DOI: https://doi.org/10.1108/17440080710834274
ISSN: 1744-0084

Keywords

  • Braille
  • Reading aids
  • Translation services
  • Languages

To view the access options for this content please click here
Article
Publication date: 7 November 2016

A lexicon based approach for classifying Arabic multi-labeled text

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid…

HTML
PDF (659 KB)

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/IJWIS-01-2016-0002
ISSN: 1744-0084

Keywords

  • Label-set dimensionality
  • Lexicon-based multi-label classification
  • ML-Accuracy
  • Multi-label data
  • Single-label data

To view the access options for this content please click here
Article
Publication date: 21 November 2008

Arabic script language identification using letter frequency neural networks

Ali Selamat and Choon‐Ching Ng

With the rapid emergence and explosion of the internet and the trend of globalization, a tremendous number of textual documents written in different languages are…

HTML
PDF (919 KB)

Abstract

Purpose

With the rapid emergence and explosion of the internet and the trend of globalization, a tremendous number of textual documents written in different languages are electronically accessible online from the world wide web. Efficiently and effectively managing these documents written in different languages is important to organizations and individuals. Therefore, the purpose of this paper is to propose letter frequency neural networks to enhance the performance of language identification.

Design/methodology/approach

Initially, the paper analyzes the feasibility of using a windowing algorithm in order to find the best method in selecting the features of Arabic script documents language identification using backpropagation neural networks. Previously, it had been found that the sliding window and non‐sliding window algorithm used as feature selection methods in the experiments did not yield a good result. Therefore, this paper proposes, a language identification of Arabic script documents based on letter frequency using a backpropagation neural network and used the datasets belonging to Arabic, Persian, Urdu and Pashto language documents which are all Arabic script languages.

Findings

The experiments have shown that the average root mean squared error of Arabic script document language identification based on letter frequency feature selection algorithm is lower than the windowing algorithm.

Originality/value

This paper highlights the fact that using neural networks with proper feature selection methods will increase the performance of language identification.

Details

International Journal of Web Information Systems, vol. 4 no. 4
Type: Research Article
DOI: https://doi.org/10.1108/17440080810919503
ISSN: 1744-0084

Keywords

  • Neural net
  • Programming and algorithm theory
  • Algorithmic languages

To view the access options for this content please click here
Article
Publication date: 12 June 2019

The impact of using different keyboards on free-text keystroke dynamics authentication for Arabic language

Suliman A. Alsuhibany, Muna Almushyti, Noorah Alghasham and Fatimah Alkhudhayr

Nowadays, there is a high demand for online services and applications. However, there is a challenge to keep these applications secured by applying different methods…

HTML
PDF (958 KB)

Abstract

Purpose

Nowadays, there is a high demand for online services and applications. However, there is a challenge to keep these applications secured by applying different methods rather than using the traditional approaches such as passwords and usernames. Keystroke dynamics is one of the alternative authentication methods that provide high level of security in which the used keyboard plays an important role in the recognition accuracy. To guarantee the robustness of a system in different practical situations, there is a need to examine how much the performance of the system is affected by changing the keyboard layout. This paper aims to investigate the impact of using different keyboards on the recognition accuracy for Arabic free-text typing.

Design/methodology/approach

To evaluate how much the performance of the system is affected by changing the keyboard layout, an experimental study is conducted by using two different keyboards which are a Mac’s keyboard and an HP’s keyboard.

Findings

By using the Mac’s keyboard, the results showed that the false rejection rate (FRR) was 0.20, whilst the false acceptance rate (FAR) was 0.44. However, these values have changed when using the HP’s keyboard where the FRR was equal to 0.08 and the FAR was equal to 0.60.

Research limitations/implications

The number of participants in the experiment, as the authors were targeting much more participants.

Originality/value

These results showed for the first time the impact of the keyboards on the system’s performance regarding the recognition accuracy when using Arabic free-text.

Details

Information & Computer Security, vol. 27 no. 2
Type: Research Article
DOI: https://doi.org/10.1108/ICS-09-2017-0062
ISSN: 2056-4961

Keywords

  • Information security
  • Arabic language
  • Keystroke dynamics
  • Authentication
  • Keyboard-layout

Access
Only content I have access to
Only Open Access
Year
  • Last week (2)
  • Last month (9)
  • Last 3 months (46)
  • Last 6 months (94)
  • Last 12 months (159)
  • All dates (1593)
Content type
  • Article (1235)
  • Book part (263)
  • Earlycite article (77)
  • Case study (16)
  • Expert briefing (2)
1 – 10 of over 1000
Emerald Publishing
  • Opens in new window
  • Opens in new window
  • Opens in new window
  • Opens in new window
© 2021 Emerald Publishing Limited

Services

  • Authors Opens in new window
  • Editors Opens in new window
  • Librarians Opens in new window
  • Researchers Opens in new window
  • Reviewers Opens in new window

About

  • About Emerald Opens in new window
  • Working for Emerald Opens in new window
  • Contact us Opens in new window
  • Publication sitemap

Policies and information

  • Privacy notice
  • Site policies
  • Modern Slavery Act Opens in new window
  • Chair of Trustees governance statement Opens in new window
  • COVID-19 policy Opens in new window
Manage cookies

We’re listening — tell us what you think

  • Something didn’t work…

    Report bugs here

  • All feedback is valuable

    Please share your general feedback

  • Member of Emerald Engage?

    You can join in the discussion by joining the community or logging in here.
    You can also find out more about Emerald Engage.

Join us on our journey

  • Platform update page

    Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

  • Questions & More Information

    Answers to the most commonly asked questions here