To read the full version of this content please select one of the options below:

Combating the challenges of social media hate speech in a polarized society: A Twitter ego lexalytics approach

Collins Udanor (Department of Computer Science, University of Nigeria Nsukka, Nsukka, Nigeria)
Chinatu C. Anyanwu (Department of Computer Science, University of Nigeria Nsukka, Nsukka, Nigeria)

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 13 September 2019

Issue publication date: 22 October 2019

Abstract

Purpose

Hate speech in recent times has become a troubling development. It has different meanings to different people in different cultures. The anonymity and ubiquity of the social media provides a breeding ground for hate speech and makes combating it seems like a lost battle. However, what may constitute a hate speech in a cultural or religious neutral society may not be perceived as such in a polarized multi-cultural and multi-religious society like Nigeria. Defining hate speech, therefore, may be contextual. Hate speech in Nigeria may be perceived along ethnic, religious and political boundaries. The purpose of this paper is to check for the presence of hate speech in social media platforms like Twitter, and to what degree is hate speech permissible, if available? It also intends to find out what monitoring mechanisms the social media platforms like Facebook and Twitter have put in place to combat hate speech. Lexalytics is a term coined by the authors from the words lexical analytics for the purpose of opinion mining unstructured texts like tweets.

Design/methodology/approach

This research developed a Python software called polarized opinions sentiment analyzer (POSA), adopting an ego social network analytics technique in which an individual’s behavior is mined and described. POSA uses a customized Python N-Gram dictionary of local context-based terms that may be considered as hate terms. It then applied the Twitter API to stream tweets from popular and trending Nigerian Twitter handles in politics, ethnicity, religion, social activism, racism, etc., and filtered the tweets against the custom dictionary using unsupervised classification of the texts as either positive or negative sentiments. The outcome is visualized using tables, pie charts and word clouds. A similar implementation was also carried out using R-Studio codes and both results are compared and a t-test was applied to determine if there was a significant difference in the results. The research methodology can be classified as both qualitative and quantitative. Qualitative in terms of data classification, and quantitative in terms of being able to identify the results as either negative or positive from the computation of text to vector.

Findings

The findings from two sets of experiments on POSA and R are as follows: in the first experiment, the POSA software found that the Twitter handles analyzed contained between 33 and 55 percent hate contents, while the R results show hate contents ranging from 38 to 62 percent. Performing a t-test on both positive and negative scores for both POSA and R-studio, results reveal p-values of 0.389 and 0.289, respectively, on an α value of 0.05, implying that there is no significant difference in the results from POSA and R. During the second experiment performed on 11 local handles with 1,207 tweets, the authors deduce as follows: that the percentage of hate contents classified by POSA is 40 percent, while the percentage of hate contents classified by R is 51 percent. That the accuracy of hate speech classification predicted by POSA is 87 percent, while free speech is 86 percent. And the accuracy of hate speech classification predicted by R is 65 percent, while free speech is 74 percent. This study reveals that neither Twitter nor Facebook has an automated monitoring system for hate speech, and no benchmark is set to decide the level of hate contents allowed in a text. The monitoring is rather done by humans whose assessment is usually subjective and sometimes inconsistent.

Research limitations/implications

This study establishes the fact that hate speech is on the increase on social media. It also shows that hate mongers can actually be pinned down, with the contents of their messages. The POSA system can be used as a plug-in by Twitter to detect and stop hate speech on its platform. The study was limited to public Twitter handles only. N-grams are effective features for word-sense disambiguation, but when using N-grams, the feature vector could take on enormous proportions and in turn increasing sparsity of the feature vectors.

Practical implications

The findings of this study show that if urgent measures are not taken to combat hate speech there could be dare consequences, especially in highly polarized societies that are always heated up along religious and ethnic sentiments. On daily basis tempers are flaring in the social media over comments made by participants. This study has also demonstrated that it is possible to implement a technology that can track and terminate hate speech in a micro-blog like Twitter. This can also be extended to other social media platforms.

Social implications

This study will help to promote a more positive society, ensuring the social media is positively utilized to the benefit of mankind.

Originality/value

The findings can be used by social media companies to monitor user behaviors, and pin hate crimes to specific persons. Governments and law enforcement bodies can also use the POSA application to track down hate peddlers.

Keywords

Citation

Udanor, C. and Anyanwu, C.C. (2019), "Combating the challenges of social media hate speech in a polarized society: A Twitter ego lexalytics approach", Data Technologies and Applications, Vol. 53 No. 4, pp. 501-527. https://doi.org/10.1108/DTA-01-2019-0007

Publisher

:

Emerald Publishing Limited

Copyright © 2019, Emerald Publishing Limited