Search results

1 – 10 of over 1000
Article
Publication date: 16 August 2021

Nael Alqtati, Jonathan A.J. Wilson and Varuna De Silva

This paper aims to equip professionals and researchers in the fields of advertising, branding, public relations, marketing communications, social media analytics and marketing…

Abstract

Purpose

This paper aims to equip professionals and researchers in the fields of advertising, branding, public relations, marketing communications, social media analytics and marketing with a simple, effective and dynamic means of evaluating consumer behavioural sentiments and engagement through Arabic language and script, in vivo.

Design/methodology/approach

Using quantitative and qualitative situational linguistic analyses of Classical Arabic, found in Quranic and religious texts scripts; Modern Standard Arabic, which is commonly used in formal Arabic channels; and dialectical Arabic, which varies hugely from one Arabic country to another: this study analyses rich marketing and consumer messages (tweets) – as a basis for developing an Arabic language social media methodological tool.

Findings

Despite the popularity of Arabic language communication on social media platforms across geographies, currently, comprehensive language processing toolkits for analysing Arabic social media conversations have limitations and require further development. Furthermore, due to its unique morphology, developing text understanding capabilities specific to the Arabic language poses challenges.

Practical implications

This study demonstrates the application and effectiveness of the proposed methodology on a random sample of Twitter data from Arabic-speaking regions. Furthermore, as Arabic is the language of Islam, the study is of particular importance to Islamic and Muslim geographies, markets and marketing.

Social implications

The findings suggest that the proposed methodology has a wider potential beyond the data set and health-care sector analysed, and therefore, can be applied to further markets, social media platforms and consumer segments.

Originality/value

To remedy these gaps, this study presents a new methodology and analytical approach to investigating Arabic language social media conversations, which brings together a multidisciplinary knowledge of technology, data science and marketing communications.

Article
Publication date: 20 June 2018

Ramzi A. Haraty and Rouba Nasrallah

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous…

2179

Abstract

Purpose

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules.

Design/methodology/approach

The proposed model uses an association rule algorithm for extracting frequent sets containing related items – to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together.

Findings

The proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works.

Research limitations/implications

The stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results.

Originality/value

In this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.

Details

Library Hi Tech, vol. 37 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 3 April 2020

Abdelhalim Saadi and Hacene Belhadef

The purpose of this paper is to present a system based on deep neural networks to extract particular entities from natural language text, knowing that a massive amount of textual…

Abstract

Purpose

The purpose of this paper is to present a system based on deep neural networks to extract particular entities from natural language text, knowing that a massive amount of textual information is electronically available at present. Notably, a large amount of electronic text data indicates great difficulty in finding or extracting relevant information from them.

Design/methodology/approach

This study presents an original system to extract Arabic-named entities by combining a deep neural network-based part-of-speech tagger and a neural network-based named entity extractor. Firstly, the system extracts the grammatical classes of the words with high precision depending on the context of the word. This module plays the role of the disambiguation process. Then, a second module is used to extract the named entities.

Findings

Using deep neural networks in natural language processing, requires tuning many hyperparameters, which is a time-consuming process. To deal with this problem, applying statistical methods like the Taguchi method is much requested. In this study, the system is successfully applied to the Arabic-named entities recognition, where accuracy of 96.81 per cent was reported, which is better than the state-of-the-art results.

Research limitations/implications

The system is designed and trained for the Arabic language, but the architecture can be used for other languages.

Practical implications

Information extraction systems are developed for different applications, such as analysing newspaper articles and databases for commercial, political and social objectives. Information extraction systems also can be built over an information retrieval (IR) system. The IR system eliminates irrelevant documents and paragraphs.

Originality/value

The proposed system can be regarded as the first attempt to use double deep neural networks to increase the accuracy. It also can be built over an IR system. The IR system eliminates irrelevant documents and paragraphs. This process reduces the mass number of documents from which the authors wish to extract the relevant information using an information extraction system.

Details

Smart and Sustainable Built Environment, vol. 9 no. 4
Type: Research Article
ISSN: 2046-6099

Keywords

Article
Publication date: 18 April 2017

Mahmoud Al-Ayyoub, Ahmed Alwajeeh and Ismail Hmeidi

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of…

Abstract

Purpose

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance.

Design/methodology/approach

Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings.

Findings

The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings.

Practical implications

Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis.

Originality/value

This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text.

Details

International Journal of Web Information Systems, vol. 13 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 December 1999

Katia Medawar

Developing software for processing bibliographic materials in the Arabic language is a relatively recent development. When libraries in parts of the Middle East, where Arabic is…

Abstract

Developing software for processing bibliographic materials in the Arabic language is a relatively recent development. When libraries in parts of the Middle East, where Arabic is the main language, started automating their collections, most library systems did not provide for the use of Arabic script and this capability had to be developed. Automated library systems started to emerge (like Minisis, ALEPH, Dobis/Libis, TinLib, OLIB) to fill the gap for non‐Roman scripts. This article describes the stages the American University of Beirut Libraries went through in converting their Arabic materials for use in the OLIB7 library management system. A description of the background of the library is given along with the details of the romanisation process, the conversion process, the software and hardware chosen, the testing of the database, problems encountered, output and the handling of authority records.

Details

Program, vol. 33 no. 4
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 6 January 2022

Hanan Alghamdi and Ali Selamat

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…

Abstract

Purpose

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.

Design/methodology/approach

This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.

Findings

Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.

Originality/value

At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 28 November 2017

Mansoor Alghamdi and William Teahan

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future…

6582

Abstract

Purpose

The aim of this paper is to experimentally evaluate the effectiveness of the state-of-the-art printed Arabic text recognition systems to determine open areas for future improvements. In addition, this paper proposes a standard protocol with a set of metrics for measuring the effectiveness of Arabic optical character recognition (OCR) systems to assist researchers in comparing different Arabic OCR approaches.

Design/methodology/approach

This paper describes an experiment to automatically evaluate four well-known Arabic OCR systems using a set of performance metrics. The evaluation experiment is conducted on a publicly available printed Arabic dataset comprising 240 text images with a variety of resolution levels, font types, font styles and font sizes.

Findings

The experimental results show that the field of character recognition for printed Arabic still requires further research to reach an efficient text recognition method for Arabic script.

Originality/value

To the best of the authors’ knowledge, this is the first work that provides a comprehensive automated evaluation of Arabic OCR systems with respect to the characteristics of Arabic script and, in addition, proposes an evaluation methodology that can be used as a benchmark by researchers and therefore will contribute significantly to the enhancement of the field of Arabic script recognition.

Details

PSU Research Review, vol. 1 no. 3
Type: Research Article
ISSN: 2399-1747

Keywords

Article
Publication date: 1 April 1992

Joan M. Aliprand

Arabic script is the most recent addition to the scripts available on the Research libraries Information Network (RLIN). Bibliographic control and retrieval using the authentic…

Abstract

Arabic script is the most recent addition to the scripts available on the Research libraries Information Network (RLIN). Bibliographic control and retrieval using the authentic writing system are available for titles in Arabic, Persian (Farsi), Urdu, Ottoman Turkish, and other languages written with Arabic script. RLIN is the world's largest bibliographic database for Middle Eastern language material. This paper is a comprehensive description of the Arabic script features of RLIN. It covers Arabic character sets and RLIN's character repertoire for Arabic script; how Arabic characters are input and stored in the RLIN database; the equipment needed for Arabic script support; the indexing, retrieval, and presentation of records containing Arabic script; the inclusion of non‐Roman data in USMARC bibliographic records; and statistics on the RLIN databases. Sidebars explain features of Arabic writing. The discussion of data storage and presentation of text is relevant to any computer application that involves Arabic script.

Details

Library Hi Tech, vol. 10 no. 4
Type: Research Article
ISSN: 0737-8831

Open Access
Article
Publication date: 1 December 2016

Jennifer Ball and Muna Kashoob

Most teachers in the Gulf would agree that Arab learners struggle more with reading and writing than listening and speaking. One little considered possible influence on this is…

Abstract

Most teachers in the Gulf would agree that Arab learners struggle more with reading and writing than listening and speaking. One little considered possible influence on this is the particular visual processing requirements of English. This article suggests why visual processing or visual cognition might be a particular difficulty for Arab students reading English. It offers a simple classroom checklist that may assist teachers to notice if visual processing strain could be effecting their student’s attention, motivation and performance.

Details

Learning and Teaching in Higher Education: Gulf Perspectives, vol. 13 no. 2
Type: Research Article
ISSN: 2077-5504

Article
Publication date: 22 March 2022

Djamila Mohdeb, Meriem Laifa, Fayssal Zerargui and Omar Benzaoui

The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African…

Abstract

Purpose

The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African refugees and illegal migrants on the YouTube Algerian space.

Design/methodology/approach

The transfer learning approach which recently presents the state-of-the-art approach in natural language processing tasks has been exploited to classify and detect hate speech in Algerian dialectal Arabic. Besides, a descriptive analysis has been conducted to answer the analytical research questions that aim at measuring and evaluating the presence of the anti-refugee/migrant discourse on the YouTube social platform.

Findings

Data analysis revealed that there has been a gradual modest increase in the number of anti-refugee/migrant hateful comments on YouTube since 2014, a sharp rise in 2017 and a sharp decline in later years until 2021. Furthermore, our findings stemming from classifying hate content using multilingual and monolingual pre-trained language transformers demonstrate a good performance of the AraBERT monolingual transformer in comparison with the monodialectal transformer DziriBERT and the cross-lingual transformers mBERT and XLM-R.

Originality/value

Automatic hate speech detection in languages other than English is quite a challenging task that the literature has tried to address by various approaches of machine learning. Although the recent approach of cross-lingual transfer learning offers a promising solution, tackling this problem in the context of the Arabic language, particularly dialectal Arabic makes it even more challenging. Our results cast a new light on the actual ability of the transfer learning approach to deal with low-resource languages that widely differ from high-resource languages as well as other Latin-based, low-resource languages.

Details

Aslib Journal of Information Management, vol. 74 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

1 – 10 of over 1000