Search results
1 – 10 of 128Imane Guellil, Ahsan Adeel, Faical Azouaou, Sara Chennoufi, Hanene Maafi and Thinhinane Hamitouche
This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been…
Abstract
Purpose
This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been presented for other languages such as English. However, to the best of the authors’ knowledge, not much work has been conducted in the Arabic language.
Design/methodology/approach
This approach uses both classical algorithms of classification and deep learning algorithms. For the classical algorithms, the authors use Gaussian NB (GNB), Logistic Regression (LR), Random Forest (RF), SGD Classifier (SGD) and Linear SVC (LSVC). For the deep learning classification, four different algorithms (convolutional neural network (CNN), multilayer perceptron (MLP), long- or short-term memory (LSTM) and bi-directional long- or short-term memory (Bi-LSTM) are applied. For extracting features, the authors use both Word2vec and FastText with their two implementations, namely, Skip Gram (SG) and Continuous Bag of Word (CBOW).
Findings
Simulation results demonstrate the best performance of LSVC, BiLSTM and MLP achieving an accuracy up to 91%, when it is associated to SG model. The results are also shown that the classification that has been done on balanced corpus are more accurate than those done on unbalanced corpus.
Originality/value
The principal originality of this paper is to construct a new hate speech corpus (Arabic_fr_en) which was annotated by three different annotators. This corpus contains the three languages used by Arabic people being Arabic, French and English. For Arabic, the corpus contains both script Arabic and Arabizi (i.e. Arabic words written with Latin letters). Another originality is to rely on both shallow and deep leaning classification by using different model for extraction features such as Word2vec and FastText with their two implementation SG and CBOW.
Details
Keywords
Djamila Mohdeb, Meriem Laifa, Fayssal Zerargui and Omar Benzaoui
The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African…
Abstract
Purpose
The present study was designed to investigate eight research questions that are related to the analysis and the detection of dialectal Arabic hate speech that targeted African refugees and illegal migrants on the YouTube Algerian space.
Design/methodology/approach
The transfer learning approach which recently presents the state-of-the-art approach in natural language processing tasks has been exploited to classify and detect hate speech in Algerian dialectal Arabic. Besides, a descriptive analysis has been conducted to answer the analytical research questions that aim at measuring and evaluating the presence of the anti-refugee/migrant discourse on the YouTube social platform.
Findings
Data analysis revealed that there has been a gradual modest increase in the number of anti-refugee/migrant hateful comments on YouTube since 2014, a sharp rise in 2017 and a sharp decline in later years until 2021. Furthermore, our findings stemming from classifying hate content using multilingual and monolingual pre-trained language transformers demonstrate a good performance of the AraBERT monolingual transformer in comparison with the monodialectal transformer DziriBERT and the cross-lingual transformers mBERT and XLM-R.
Originality/value
Automatic hate speech detection in languages other than English is quite a challenging task that the literature has tried to address by various approaches of machine learning. Although the recent approach of cross-lingual transfer learning offers a promising solution, tackling this problem in the context of the Arabic language, particularly dialectal Arabic makes it even more challenging. Our results cast a new light on the actual ability of the transfer learning approach to deal with low-resource languages that widely differ from high-resource languages as well as other Latin-based, low-resource languages.
Details
Keywords
Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu and Idowu Ademola Osinuga
Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with…
Abstract
Purpose
Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.
Design/methodology/approach
This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.
Findings
The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.
Research limitations/implications
Finally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.
Originality/value
The main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.
Details
Keywords
Hanan Alghamdi and Ali Selamat
With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites…
Abstract
Purpose
With the proliferation of terrorist/extremist websites on the World Wide Web, it has become progressively more crucial to detect and analyze the content on these websites. Accordingly, the volume of previous research focused on identifying the techniques and activities of terrorist/extremist groups, as revealed by their sites on the so-called dark web, has also grown.
Design/methodology/approach
This study presents a review of the techniques used to detect and process the content of terrorist/extremist sites on the dark web. Forty of the most relevant data sources were examined, and various techniques were identified among them.
Findings
Based on this review, it was found that methods of feature selection and feature extraction can be used as topic modeling with content analysis and text clustering.
Originality/value
At the end of the review, present the current state-of-the- art and certain open issues associated with Arabic dark Web content analysis.
Details
Keywords
Meriem Laifa and Djamila Mohdeb
This study provides an overview of the application of sentiment analysis (SA) in exploring social movements (SMs). It also compares different models for a SA task of Algerian…
Abstract
Purpose
This study provides an overview of the application of sentiment analysis (SA) in exploring social movements (SMs). It also compares different models for a SA task of Algerian Arabic tweets related to early days of the Algerian SM, called Hirak.
Design/methodology/approach
Related tweets were retrieved using relevant hashtags followed by multiple data cleaning procedures. Foundational machine learning methods such as Naive Bayes, Support Vector Machine, Logistic Regression (LR) and Decision Tree were implemented. For each classifier, two feature extraction techniques were used and compared, namely Bag of Words and Term Frequency–Inverse Document Frequency. Moreover, three fine-tuned pretrained transformers AraBERT and DziriBERT and the multilingual transformer XLM-R were used for the comparison.
Findings
The findings of this paper emphasize the vital role social media played during the Hirak. Results revealed that most individuals had a positive attitude toward the Hirak. Moreover, the presented experiments provided important insights into the possible use of both basic machine learning and transfer learning models to analyze SA of Algerian text datasets. When comparing machine learning models with transformers in terms of accuracy, precision, recall and F1-score, the results are fairly similar, with LR outperforming all models with a 68 per cent accuracy rate.
Originality/value
At the time of writing, the Algerian SM was not thoroughly investigated or discussed in the Computer Science literature. This analysis makes a limited but unique contribution to understanding the Algerian Hirak using artificial intelligence. This study proposes what it considers to be a unique basis for comprehending this event with the goal of generating a foundation for future studies by comparing different SA techniques on a low-resource language.
Details
Keywords
Roots of global Terrorism are in ‘failed’ states carved out of multiracial empires after World Wars I and II in name of ‘national self‐determination’. Both sides in the Cold War…
Abstract
Roots of global Terrorism are in ‘failed’ states carved out of multiracial empires after World Wars I and II in name of ‘national self‐determination’. Both sides in the Cold War competed to exploit the process of disintegration with armed and covert interventions. In effect, they were colluding at the expense of the ‘liberated’ peoples. The ‘Vietnam Trauma’ prevented effective action against the resulting terrorist buildup and blowback until 9/11. As those vultures come home to roost, the war broadens to en vision overdue but coercive reforms to the postwar system of nation states, first in the Middle East. Mirages of Vietnam blur the vision; can the sole Superpower finish the job before fiscal and/or imperial overstretch implode it?
This paper purposed a multi-facet sentiment analysis system.
Abstract
Purpose
This paper purposed a multi-facet sentiment analysis system.
Design/methodology/approach
Hence, This paper uses multidomain resources to build a sentiment analysis system. The manual lexicon based features that are extracted from the resources are fed into a machine learning classifier to compare their performance afterward. The manual lexicon is replaced with a custom BOW to deal with its time consuming construction. To help the system run faster and make the model interpretable, this will be performed by employing different existing and custom approaches such as term occurrence, information gain, principal component analysis, semantic clustering, and POS tagging filters.
Findings
The proposed system featured by lexicon extraction automation and characteristics size optimization proved its efficiency when applied to multidomain and benchmark datasets by reaching 93.59% accuracy which makes it competitive to the state-of-the-art systems.
Originality/value
The construction of a custom BOW. Optimizing features based on existing and custom feature selection and clustering approaches.
Details
Keywords
Social networks (SNs) have recently evolved from a means of connecting people to becoming a tool for social engineering, radicalization, dissemination of propaganda and…
Abstract
Purpose
Social networks (SNs) have recently evolved from a means of connecting people to becoming a tool for social engineering, radicalization, dissemination of propaganda and recruitment of terrorists. It is no secret that the majority of the Islamic State in Iraq and Syria (ISIS) members are Arabic speakers, and even the non-Arabs adopt Arabic nicknames. However, the majority of the literature researching the subject deals with non-Arabic languages. Moreover, the features involved in identifying radical Islamic content are shallow and the search or classification terms are common in daily chatter among people of the region. The authors aim at distinguishing normal conversation, influenced by the role religion plays in daily life, from terror-related content.
Design/methodology/approach
This article presents the authors' experience and the results of collecting, analyzing and classifying Twitter data from affiliated members of ISIS, as well as sympathizers. The authors used artificial intelligence (AI) and machine learning classification algorithms to categorize the tweets, as terror-related, generic religious, and unrelated.
Findings
The authors report the classification accuracy of the K-nearest neighbor (KNN), Bernoulli Naive Bayes (BNN) and support vector machine (SVM) [one-against-all (OAA) and all-against-all (AAA)] algorithms. The authors achieved a high classification F1 score of 83\%. The work in this paper will hopefully aid more accurate classification of radical content.
Originality/value
In this paper, the authors have collected and analyzed thousands of tweets advocating and promoting ISIS. The authors have identified many common markers and keywords characteristic of ISIS rhetoric. Moreover, the authors have applied text processing and AI machine learning techniques to classify the tweets into one of three categories: terror-related, non-terror political chatter and news and unrelated data-polluting tweets.
Details
Keywords
Politics has three contingent aspects, viz. a role‐play, a Majestic Art and a human social science. We may talk of the first aspect as just politicking. The second aspect was…
Abstract
Politics has three contingent aspects, viz. a role‐play, a Majestic Art and a human social science. We may talk of the first aspect as just politicking. The second aspect was delineated by Aristotle as an art of doing vis‐a‐vis the art of making: He meant by the former a behavioural art and by the latter a productive art; and the third aspect represents a comprehensive analytical study of human social behaviour.