Search results

1 – 10 of over 39000
Article
Publication date: 13 July 2021

Shubham Bharti, Arun Kumar Yadav, Mohit Kumar and Divakar Yadav

With the rise of social media platforms, an increasing number of cases of cyberbullying has reemerged. Every day, large number of people, especially teenagers, become the…

Abstract

Purpose

With the rise of social media platforms, an increasing number of cases of cyberbullying has reemerged. Every day, large number of people, especially teenagers, become the victim of cyber abuse. A cyberbullied person can have a long-lasting impact on his mind. Due to it, the victim may develop social anxiety, engage in self-harm, go into depression or in the extreme cases, it may lead to suicide. This paper aims to evaluate various techniques to automatically detect cyberbullying from tweets by using machine learning and deep learning approaches.

Design/methodology/approach

The authors applied machine learning algorithms approach and after analyzing the experimental results, the authors postulated that deep learning algorithms perform better for the task. Word-embedding techniques were used for word representation for our model training. Pre-trained embedding GloVe was used to generate word embedding. Different versions of GloVe were used and their performance was compared. Bi-directional long short-term memory (BLSTM) was used for classification.

Findings

The dataset contains 35,787 labeled tweets. The GloVe840 word embedding technique along with BLSTM provided the best results on the dataset with an accuracy, precision and F1 measure of 92.60%, 96.60% and 94.20%, respectively.

Research limitations/implications

If a word is not present in pre-trained embedding (GloVe), it may be given a random vector representation that may not correspond to the actual meaning of the word. It means that if a word is out of vocabulary (OOV) then it may not be represented suitably which can affect the detection of cyberbullying tweets. The problem may be rectified through the use of character level embedding of words.

Practical implications

The findings of the work may inspire entrepreneurs to leverage the proposed approach to build deployable systems to detect cyberbullying in different contexts such as workplace, school, etc and may also draw the attention of lawmakers and policymakers to create systemic tools to tackle the ills of cyberbullying.

Social implications

Cyberbullying, if effectively detected may save the victims from various psychological problems which, in turn, may lead society to a healthier and more productive life.

Originality/value

The proposed method produced results that outperform the state-of-the-art approaches in detecting cyberbullying from tweets. It uses a large dataset, created by intelligently merging two publicly available datasets. Further, a comprehensive evaluation of the proposed methodology has been presented.

Details

Kybernetes, vol. 51 no. 9
Type: Research Article
ISSN: 0368-492X

Keywords

Open Access
Article
Publication date: 13 October 2022

Linzi Wang, Qiudan Li, Jingjun David Xu and Minjie Yuan

Mining user-concerned actionable and interpretable hot topics will help management departments fully grasp the latest events and make timely decisions. Existing topic…

62

Abstract

Purpose

Mining user-concerned actionable and interpretable hot topics will help management departments fully grasp the latest events and make timely decisions. Existing topic models primarily integrate word embedding and matrix decomposition, which only generates keyword-based hot topics with weak interpretability, making it difficult to meet the specific needs of users. Mining phrase-based hot topics with syntactic dependency structure have been proven to model structure information effectively. A key challenge lies in the effective integration of the above information into the hot topic mining process.

Design/methodology/approach

This paper proposes the nonnegative matrix factorization (NMF)-based hot topic mining method, semantics syntax-assisted hot topic model (SSAHM), which combines semantic association and syntactic dependency structure. First, a semantic–syntactic component association matrix is constructed. Then, the matrix is used as a constraint condition to be incorporated into the block coordinate descent (BCD)-based matrix decomposition process. Finally, a hot topic information-driven phrase extraction algorithm is applied to describe hot topics.

Findings

The efficacy of the developed model is demonstrated on two real-world datasets, and the effects of dependency structure information on different topics are compared. The qualitative examples further explain the application of the method in real scenarios.

Originality/value

Most prior research focuses on keyword-based hot topics. Thus, the literature is advanced by mining phrase-based hot topics with syntactic dependency structure, which can effectively analyze the semantics. The development of syntactic dependency structure considering the combination of word order and part-of-speech (POS) is a step forward as word order, and POS are only separately utilized in the prior literature. Ignoring this synergy may miss important information, such as grammatical structure coherence and logical relations between syntactic components.

Details

Journal of Electronic Business & Digital Economics, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2754-4214

Keywords

Article
Publication date: 24 July 2020

Thanh-Tho Quan, Duc-Trung Mai and Thanh-Duy Tran

This paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels…

Abstract

Purpose

This paper proposes an approach to identify categorical influencers (i.e. influencers is the person who is active in the targeted categories) in social media channels. Categorical influencers are important for media marketing but to automatically detect them remains a challenge.

Design/methodology/approach

We deployed the emerging deep learning approaches. Precisely, we used word embedding to encode semantic information of words occurring in the common microtext of social media and used variational autoencoder (VAE) to approximate the topic modeling process, through which the active categories of influencers are automatically detected. We developed a system known as Categorical Influencer Detection (CID) to realize those ideas.

Findings

The approach of using VAE to simulate the Latent Dirichlet Allocation (LDA) process can effectively handle the task of topic modeling on the vast dataset of microtext on social media channels.

Research limitations/implications

This work has two major contributions. The first one is the detection of topics on microtexts using deep learning approach. The second is the identification of categorical influencers in social media.

Practical implications

This work can help brands to do digital marketing on social media effectively by approaching appropriate influencers. A real case study is given to illustrate it.

Originality/value

In this paper, we discuss an approach to automatically identify the active categories of influencers by performing topic detection from the microtext related to the influencers in social media channels. To do so, we use deep learning to approximate the topic modeling process of the conventional approaches (such as LDA).

Details

Online Information Review, vol. 44 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 24 September 2020

Toshiki Tomihira, Atsushi Otsuka, Akihiro Yamashita and Tetsuji Satoh

Recently, Unicode has been standardized with the penetration of social networking services, the use of emojis has become common. Emojis, as they are also known, are most…

Abstract

Purpose

Recently, Unicode has been standardized with the penetration of social networking services, the use of emojis has become common. Emojis, as they are also known, are most effective in expressing emotions in sentences. Sentiment analysis in natural language processing manually labels emotions for sentences. The authors can predict sentiment using emoji of text posted on social media without labeling manually. The purpose of this paper is to propose a new model that learns from sentences using emojis as labels, collecting English and Japanese tweets from Twitter as the corpus. The authors verify and compare multiple models based on attention long short-term memory (LSTM) and convolutional neural networks (CNN) and Bidirectional Encoder Representations from Transformers (BERT).

Design/methodology/approach

The authors collected 2,661 kinds of emoji registered as Unicode characters from tweets using Twitter application programming interface. It is a total of 6,149,410 tweets in Japanese. First, the authors visualized a vector space produced by the emojis by Word2Vec. In addition, the authors found that emojis and similar meaning words of emojis are adjacent and verify that emoji can be used for sentiment analysis. Second, it involves entering a line of tweets containing emojis, learning and testing with that emoji as a label. The authors compared the BERT model with the conventional models [CNN, FastText and Attention bidirectional long short-term memory (BiLSTM)] that were high scores in the previous study.

Findings

Visualized the vector space of Word2Vec, the authors found that emojis and similar meaning words of emojis are adjacent and verify that emoji can be used for sentiment analysis. The authors obtained a higher score with BERT models compared to the conventional model. Therefore, the sophisticated experiments demonstrate that they improved the score over the conventional model in two languages. General emoji prediction is greatly influenced by context. In addition, the score may be lowered due to a misunderstanding of meaning. By using BERT based on a bi-directional transformer, the authors can consider the context.

Practical implications

The authors can find emoji in the output words by typing a word using an input method editor (IME). The current IME only considers the most latest inputted word, although it is possible to recommend emojis considering the context of the inputted sentence in this study. Therefore, the research can be used to improve IME performance in the future.

Originality/value

In the paper, the authors focus on multilingual emoji prediction. This is the first attempt of comparison at emoji prediction between Japanese and English. In addition, it is also the first attempt to use the BERT model based on the transformer for predicting limited emojis although the transformer is known to be effective for various NLP tasks. The authors found that a bidirectional transformer is suitable for emoji prediction.

Details

International Journal of Web Information Systems, vol. 16 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Abstract

Details

Using Subject Headings for Online Retrieval: Theory, Practice and Potential
Type: Book
ISBN: 978-0-12221-570-4

Open Access
Article
Publication date: 14 August 2020

Paramita Ray and Amlan Chakrabarti

Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis…

2971

Abstract

Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users opinion. Hence, the organizations would benefit through the development of a platform, which can analyze public sentiments in the social media about their products and services to provide a value addition in their business process. Over the last few years, deep learning is very popular in the areas of image classification, speech recognition, etc. However, research on the use of deep learning method in sentiment analysis is limited. It has been observed that in some cases the existing machine learning methods for sentiment analysis fail to extract some implicit aspects and might not be very useful. Therefore, we propose a deep learning approach for aspect extraction from text and analysis of users sentiment corresponding to the aspect. A seven layer deep convolutional neural network (CNN) is used to tag each aspect in the opinionated sentences. We have combined deep learning approach with a set of rule-based approach to improve the performance of aspect extraction method as well as sentiment scoring method. We have also tried to improve the existing rule-based approach of aspect extraction by aspect categorization with a predefined set of aspect categories using clustering method and compared our proposed method with some of the state-of-the-art methods. It has been observed that the overall accuracy of our proposed method is 0.87 while that of the other state-of-the-art methods like modified rule-based method and CNN are 0.75 and 0.80 respectively. The overall accuracy of our proposed method shows an increment of 7–12% from that of the state-of-the-art methods.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 6 September 2022

Hanane Sebbaq and Nour-eddine El Faddouli

The purpose of this study is, First, to leverage the limitation of annotated data and to identify the cognitive level of learning objectives efficiently, this study adopts…

Abstract

Purpose

The purpose of this study is, First, to leverage the limitation of annotated data and to identify the cognitive level of learning objectives efficiently, this study adopts transfer learning by using word2vec and a bidirectional gated recurrent units (GRU) that can fully take into account the context and improves the classification of the model. This study adds a layer based on attention mechanism (AM), which captures the context vector and gives keywords higher weight for text classification. Second, this study explains the authors’ model’s results with local interpretable model-agnostic explanations (LIME).

Design/methodology/approach

Bloom's taxonomy levels of cognition are commonly used as a reference standard for identifying e-learning contents. Many action verbs in Bloom's taxonomy, however, overlap at different levels of the hierarchy, causing uncertainty regarding the cognitive level expected. Some studies have looked into the cognitive classification of e-learning content but none has looked into learning objectives. On the other hand, most of these research papers just adopt classical machine learning algorithms. The main constraint of this study is the availability of annotated learning objectives data sets. This study managed to build a data set of 2,400 learning objectives, but this size remains limited.

Findings

This study’s experiments show that the proposed model achieves highest scores of accuracy: 90.62%, F1-score and loss. The proposed model succeeds in classifying learning objectives, which contain ambiguous verb from the Bloom’s taxonomy action verbs, while the same model without the attention layer fails. This study’s LIME explainer aids in visualizing the most essential features of the text, which contributes to justifying the final classification.

Originality/value

In this study, the main objective is to propose a model that outperforms the baseline models for learning objectives classification based on the six cognitive levels of Bloom's taxonomy. In this sense, this study builds the bidirectional GRU (BiGRU)-attention model based on the combination of the BiGRU algorithm with the AM. This study feeds the architecture with word2vec embeddings. To prove the effectiveness of the proposed model, this study compares it with four classical machine learning algorithms that are widely used for the cognitive classification of text: Bayes naive, logistic regression, support vector machine and K-nearest neighbors and with GRU. The main constraint related to this study is the absence of annotated data; there is no annotated learning objective data set based on Bloom’s taxonomy's cognitive levels. To overcome this problem, this study seemed to have no choice but to build the data set.

Article
Publication date: 12 January 2021

Hui Yuan, Yuanyuan Tang, Wei Xu and Raymond Yiu Keung Lau

Despite the extensive academic interest in social media sentiment for financial fields, multimodal data in the stock market has been neglected. The purpose of this paper…

Abstract

Purpose

Despite the extensive academic interest in social media sentiment for financial fields, multimodal data in the stock market has been neglected. The purpose of this paper is to explore the influence of multimodal social media data on stock performance, and investigate the underlying mechanism of two forms of social media data, i.e. text and pictures.

Design/methodology/approach

This research employs panel vector autoregressive models to quantify the effect of the sentiment derived from two modalities in social media, i.e. text information and picture information. Through the models, the authors examine the short-term and long-term associations between social media sentiment and stock performance, measured by three metrics. Specifically, the authors design an enhanced sentiment analysis method, integrating random walk and word embeddings through Global Vectors for Word Representation (GloVe), to construct a domain-specific lexicon and apply it to textual sentiment analysis. Secondly, the authors exploit a deep learning framework based on convolutional neural networks to analyze the sentiment in picture data.

Findings

The empirical results derived from vector autoregressive models reveal that both measures of the sentiment extracted from textual information and pictorial information in social media are significant leading indicators of stock performance. Moreover, pictorial information and textual information have similar relationships with stock performance.

Originality/value

To the best of the authors’ knowledge, this is the first study that incorporates multimodal social media data for sentiment analysis, which is valuable in understanding pictures of social media data. The study offers significant implications for researchers and practitioners. This research informs researchers on the attention of multimodal social media data. The study’s findings provide some managerial recommendations, e.g. watching not only words but also pictures in social media.

Details

Internet Research, vol. 31 no. 3
Type: Research Article
ISSN: 1066-2243

Keywords

Open Access
Article
Publication date: 21 June 2021

Bufei Xing, Haonan Yin, Zhijun Yan and Jiachen Wang

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval…

Abstract

Purpose

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and sharing.

Design/methodology/approach

This paper proposes a hybrid approach to combining domain knowledge similarity and topic similarity to retrieve similar questions in online health communities. The domain knowledge similarity can evaluate the domain distance between different questions. And the topic similarity measures questions’ relationship base on the extracted latent topics.

Findings

The experiment results show that the proposed method outperforms the baseline methods.

Originality/value

This method conquers the problem of word mismatch and considers the named entities included in questions, which most of existing studies did not.

Details

International Journal of Crowd Science, vol. 5 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Article
Publication date: 7 October 2021

Juan Yang, Xu Du, Jui-Long Hung and Chih-hsiung Tu

Critical thinking is considered important in psychological science because it enables students to make effective decisions and optimizes their performance. Aiming at the…

Abstract

Purpose

Critical thinking is considered important in psychological science because it enables students to make effective decisions and optimizes their performance. Aiming at the challenges and issues of understanding the student's critical thinking, the objective of this study is to analyze online discussion data through an advanced multi-feature fusion modeling (MFFM) approach for automatically and accurately understanding the student's critical thinking levels.

Design/methodology/approach

An advanced MFFM approach is proposed in this study. Specifically, with considering the time-series characteristic and the high correlations between adjacent words in discussion contents, the long short-term memory–convolutional neural network (LSTM-CNN) architecture is proposed to extract deep semantic features, and then these semantic features are combined with linguistic and psychological knowledge generated by the LIWC2015 tool as the inputs of full-connected layers to automatically and accurately predict students' critical thinking levels that are hidden in online discussion data.

Findings

A series of experiments with 94 students' 7,691 posts were conducted to verify the effectiveness of the proposed approach. The experimental results show that the proposed MFFM approach that combines two types of textual features outperforms baseline methods, and the semantic-based padding can further improve the prediction performance of MFFM. It can achieve 0.8205 overall accuracy and 0.6172 F1 score for the “high” category on the validation dataset. Furthermore, it is found that the semantic features extracted by LSTM-CNN are more powerful for identifying self-introduction or off-topic discussions, while the linguistic, as well as psychological features, can better distinguish the discussion posts with the highest critical thinking level.

Originality/value

With the support of the proposed MFFM approach, online teachers can conveniently and effectively understand the interaction quality of online discussions, which can support instructional decision-making to better promote the student's knowledge construction process and improve learning performance.

Details

Data Technologies and Applications, vol. 56 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 39000