Search results

1 – 10 of 640
Article
Publication date: 2 September 2019

Guellil Imane, Darwish Kareem and Azouaou Faical

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social…

Abstract

Purpose

This paper aims to propose an approach to automatically annotate a large corpus in Arabic dialect. This corpus is used in order to analyse sentiments of Arabic users on social medias. It focuses on the Algerian dialect, which is a sub-dialect of Maghrebi Arabic. Although Algerian is spoken by roughly 40 million speakers, few studies address the automated processing in general and the sentiment analysis in specific for Algerian.

Design/methodology/approach

The approach is based on the construction and use of a sentiment lexicon to automatically annotate a large corpus of Algerian text that is extracted from Facebook. Using this approach allow to significantly increase the size of the training corpus without calling the manual annotation. The annotated corpus is then vectorized using document embedding (doc2vec), which is an extension of word embeddings (word2vec). For sentiments classification, the authors used different classifiers such as support vector machines (SVM), Naive Bayes (NB) and logistic regression (LR).

Findings

The results suggest that NB and SVM classifiers generally led to the best results and MLP generally had the worst results. Further, the threshold that the authors use in selecting messages for the training set had a noticeable impact on recall and precision, with a threshold of 0.6 producing the best results. Using PV-DBOW led to slightly higher results than using PV-DM. Combining PV-DBOW and PV-DM representations led to slightly lower results than using PV-DBOW alone. The best results were obtained by the NB classifier with F1 up to 86.9 per cent.

Originality/value

The principal originality of this paper is to determine the right parameters for automatically annotating an Algerian dialect corpus. This annotation is based on a sentiment lexicon that was also constructed automatically.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 6 January 2022

Hanan Alghamdi and Ali Selamat

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…

Abstract

Purpose

With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.

Design/methodology/approach

This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.

Findings

Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.

Originality/value

At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Article
Publication date: 29 June 2022

Ibtissam Touahri

This paper purposed a multi-facet sentiment analysis system.

Abstract

Purpose

This paper purposed a multi-facet sentiment analysis system.

Design/methodology/approach

Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.

Findings

The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.

Originality/value

The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 27 February 2023

Meriem Laifa and Djamila Mohdeb

This study provides an overview of the application of sentiment analysis (SA) in exploring social movements (SMs). It also compares different models for a SA task of Algerian…

Abstract

Purpose

This study provides an overview of the application of sentiment analysis (SA) in exploring social movements (SMs). It also compares different models for a SA task of Algerian Arabic tweets related to early days of the Algerian SM, called Hirak.

Design/methodology/approach

Related tweets were retrieved using relevant hashtags followed by multiple data cleaning procedures. Foundational machine learning methods such as Naive Bayes, Support Vector Machine, Logistic Regression (LR) and Decision Tree were implemented. For each classifier, two feature extraction techniques were used and compared, namely Bag of Words and Term Frequency–Inverse Document Frequency. Moreover, three fine-tuned pretrained transformers AraBERT and DziriBERT and the multilingual transformer XLM-R were used for the comparison.

Findings

The findings of this paper emphasize the vital role social media played during the Hirak. Results revealed that most individuals had a positive attitude toward the Hirak. Moreover, the presented experiments provided important insights into the possible use of both basic machine learning and transfer learning models to analyze SA of Algerian text datasets. When comparing machine learning models with transformers in terms of accuracy, precision, recall and F1-score, the results are fairly similar, with LR outperforming all models with a 68 per cent accuracy rate.

Originality/value

At the time of writing, the Algerian SM was not thoroughly investigated or discussed in the Computer Science literature. This analysis makes a limited but unique contribution to understanding the Algerian Hirak using artificial intelligence. This study proposes what it considers to be a unique basis for comprehending this event with the goal of generating a foundation for future studies by comparing different SA techniques on a low-resource language.

Details

Data Technologies and Applications, vol. 57 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 16 August 2021

Nael Alqtati, Jonathan A.J. Wilson and Varuna De Silva

This paper aims to equip professionals and researchers in the fields of advertising, branding, public relations, marketing communications, social media analytics and marketing…

Abstract

Purpose

This paper aims to equip professionals and researchers in the fields of advertising, branding, public relations, marketing communications, social media analytics and marketing with a simple, effective and dynamic means of evaluating consumer behavioural sentiments and engagement through Arabic language and script, in vivo.

Design/methodology/approach

Using quantitative and qualitative situational linguistic analyses of Classical Arabic, found in Quranic and religious texts scripts; Modern Standard Arabic, which is commonly used in formal Arabic channels; and dialectical Arabic, which varies hugely from one Arabic country to another: this study analyses rich marketing and consumer messages (tweets) – as a basis for developing an Arabic language social media methodological tool.

Findings

Despite the popularity of Arabic language communication on social media platforms across geographies, currently, comprehensive language processing toolkits for analysing Arabic social media conversations have limitations and require further development. Furthermore, due to its unique morphology, developing text understanding capabilities specific to the Arabic language poses challenges.

Practical implications

This study demonstrates the application and effectiveness of the proposed methodology on a random sample of Twitter data from Arabic-speaking regions. Furthermore, as Arabic is the language of Islam, the study is of particular importance to Islamic and Muslim geographies, markets and marketing.

Social implications

The findings suggest that the proposed methodology has a wider potential beyond the data set and health-care sector analysed, and therefore, can be applied to further markets, social media platforms and consumer segments.

Originality/value

To remedy these gaps, this study presents a new methodology and analytical approach to investigating Arabic language social media conversations, which brings together a multidisciplinary knowledge of technology, data science and marketing communications.

Article
Publication date: 7 November 2016

Ismail Hmeidi, Mahmoud Al-Ayyoub, Nizar A. Mahyoub and Mohammed A. Shehab

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth…

Abstract

Purpose

Multi-label Text Classification (MTC) is one of the most recent research trends in data mining and information retrieval domains because of many reasons such as the rapid growth of online data and the increasing tendency of internet users to be more comfortable with assigning multiple labels/tags to describe documents, emails, posts, etc. The dimensionality of labels makes MTC more difficult and challenging compared with traditional single-labeled text classification (TC). Because it is a natural extension of TC, several ways are proposed to benefit from the rich literature of TC through what is called problem transformation (PT) methods. Basically, PT methods transform the multi-label data into a single-label one that is suitable for traditional single-label classification algorithms. Another approach is to design novel classification algorithms customized for MTC. Over the past decade, several works have appeared on both approaches focusing mainly on the English language. This work aims to present an elaborate study of MTC of Arabic articles.

Design/methodology/approach

This paper presents a novel lexicon-based method for MTC, where the keywords that are most associated with each label are extracted from the training data along with a threshold that can later be used to determine whether each test document belongs to a certain label.

Findings

The experiments show that the presented approach outperforms the currently available approaches. Specifically, the results of our experiments show that the best accuracy obtained from existing approaches is only 18 per cent, whereas the accuracy of the presented lexicon-based approach can reach an accuracy level of 31 per cent.

Originality/value

Although there exist some tools that can be customized to address the MTC problem for Arabic text, their accuracies are very low when applied to Arabic articles. This paper presents a novel method for MTC. The experiments show that the presented approach outperforms the currently available approaches.

Details

International Journal of Web Information Systems, vol. 12 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 18 March 2021

Pandiaraj A., Sundar C. and Pavalarajan S.

Up to date development in sentiment analysis has resulted in a symbolic growth in the volume of study, especially on more subjective text types, namely, product or movie reviews…

Abstract

Purpose

Up to date development in sentiment analysis has resulted in a symbolic growth in the volume of study, especially on more subjective text types, namely, product or movie reviews. The key difference between these texts with news articles is that their target is defined and unique across the text. Hence, the reviews on newspaper articles can deal with three subtasks: correctly spotting the target, splitting the good and bad content from the reviews on the concerned target and evaluating different opinions provided in a detailed manner. On defining these tasks, this paper aims to implement a new sentiment analysis model for article reviews from the newspaper.

Design/methodology/approach

Here, tweets from various newspaper articles are taken and the sentiment analysis process is done with pre-processing, semantic word extraction, feature extraction and classification. Initially, the pre-processing phase is performed, in which different steps such as stop word removal, stemming, blank space removal are carried out and it results in producing the keywords that speak about positive, negative or neutral. Further, semantic words (similar) are extracted from the available dictionary by matching the keywords. Next, the feature extraction is done for the extracted keywords and semantic words using holoentropy to attain information statistics, which results in the attainment of maximum related information. Here, two categories of holoentropy features are extracted: joint holoentropy and cross holoentropy. These extracted features of entire keywords are finally subjected to a hybrid classifier, which merges the beneficial concepts of neural network (NN), and deep belief network (DBN). For improving the performance of sentiment classification, modification is done by inducing the idea of a modified rider optimization algorithm (ROA), so-called new steering updated ROA (NSU-ROA) into NN and DBN for weight update. Hence, the average of both improved classifiers will provide the classified sentiment as positive, negative or neutral from the reviews of newspaper articles effectively.

Findings

Three data sets were considered for experimentation. The results have shown that the developed NSU-ROA + DBN + NN attained high accuracy, which was 2.6% superior to particle swarm optimization, 3% superior to FireFly, 3.8% superior to grey wolf optimization, 5.5% superior to whale optimization algorithm and 3.2% superior to ROA-based DBN + NN from data set 1. The classification analysis has shown that the accuracy of the proposed NSU − DBN + NN was 3.4% enhanced than DBN + NN, 25% enhanced than DBN and 28.5% enhanced than NN and 32.3% enhanced than support vector machine from data set 2. Thus, the effective performance of the proposed NSU − ROA + DBN + NN on sentiment analysis of newspaper articles has been proved.

Originality/value

This paper adopts the latest optimization algorithm called the NSU-ROA to effectively recognize the sentiments of the newspapers with NN and DBN. This is the first work that uses NSU-ROA-based optimization for accurate identification of sentiments from newspaper articles.

Details

Kybernetes, vol. 51 no. 1
Type: Research Article
ISSN: 0368-492X

Keywords

Open Access
Article
Publication date: 31 July 2020

Omar Alqaryouti, Nur Siyam, Azza Abdel Monem and Khaled Shaalan

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help…

6984

Abstract

Digital resources such as smart applications reviews and online feedback information are important sources to seek customers’ feedback and input. This paper aims to help government entities gain insights on the needs and expectations of their customers. Towards this end, we propose an aspect-based sentiment analysis hybrid approach that integrates domain lexicons and rules to analyse the entities smart apps reviews. The proposed model aims to extract the important aspects from the reviews and classify the corresponding sentiments. This approach adopts language processing techniques, rules, and lexicons to address several sentiment analysis challenges, and produce summarized results. According to the reported results, the aspect extraction accuracy improves significantly when the implicit aspects are considered. Also, the integrated classification model outperforms the lexicon-based baseline and the other rules combinations by 5% in terms of Accuracy on average. Also, when using the same dataset, the proposed approach outperforms machine learning approaches that uses support vector machine (SVM). However, using these lexicons and rules as input features to the SVM model has achieved higher accuracy than other SVM models.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 16 September 2021

Sireesha Jasti

Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or…

Abstract

Purpose

Internet has endorsed a tremendous change with the advancement of the new technologies. The change has made the users of the internet to make comments regarding the service or product. The Sentiment classification is the process of analyzing the reviews for helping the user to decide whether to purchase the product or not.

Design/methodology/approach

A rider feedback artificial tree optimization-enabled deep recurrent neural networks (RFATO-enabled deep RNN) is developed for the effective classification of sentiments into various grades. The proposed RFATO algorithm is modeled by integrating the feedback artificial tree (FAT) algorithm in the rider optimization algorithm (ROA), which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of term frequency-inverse document frequency (TF-IDF) features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted. The metrics employed for the evaluation in the proposed RFATO algorithm are accuracy, sensitivity, and specificity.

Findings

By using the proposed RFATO algorithm, the evaluation metrics such as accuracy, sensitivity and specificity are maximized when compared to the existing algorithms.

Originality/value

The proposed RFATO algorithm is modeled by integrating the FAT algorithm in the ROA, which is used for training the deep RNN classifier for the classification of sentiments in the review data. The pre-processing is performed by the stemming and the stop word removal process for removing the redundancy for smoother processing of the data. The features including the sentiwordnet-based features, a variant of TF-IDF features and spam words-based features are extracted from the review data to form the feature vector. Feature fusion is performed based on the entropy of the features that are extracted.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 29 October 2018

Shrawan Kumar Trivedi and Shubhamoy Dey

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be…

Abstract

Purpose

To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews.

Design/methodology/approach

An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest.

Findings

The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48.

Research limitations/implications

Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario.

Practical implications

In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers.

Social implications

The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications.

Originality/value

The constructed PCC is novel and was tested on Indian movie review data.

1 – 10 of 640