Search results

1 – 10 of over 1000
Article
Publication date: 1 February 2021

Narasimhulu K, Meena Abarna KT and Sivakumar B

The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for…

Abstract

Purpose

The purpose of the paper is to study multiple viewpoints which are required to access the more informative similarity features among the tweets documents, which is useful for achieving the robust tweets data clustering results.

Design/methodology/approach

Let “N” be the number of tweets documents for the topics extraction. Unwanted texts, punctuations and other symbols are removed, tokenization and stemming operations are performed in the initial tweets pre-processing step. Bag-of-features are determined for the tweets; later tweets are modelled with the obtained bag-of-features during the process of topics extraction. Approximation of topics features are extracted for every tweet document. These set of topics features of N documents are treated as multi-viewpoints. The key idea of the proposed work is to use multi-viewpoints in the similarity features computation. The following figure illustrates multi-viewpoints based cosine similarity computation of the five tweets documents (here N = 5) and corresponding documents are defined in projected space with five viewpoints, say, v1,v2, v3, v4, and v5. For example, similarity features between two documents (viewpoints v1, and v2) are computed concerning the other three multi-viewpoints (v3, v4, and v5), unlike a single viewpoint in traditional cosine metric.

Findings

Healthcare problems with tweets data. Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding term frequency and inverse document frequency (TF–IDF) for unlabelled tweets.

Originality/value

Topic models play a crucial role in the classification of health-related tweets with finding topics (or health clusters) instead of finding TF-IDF for unlabelled tweets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 21 October 2021

Noorullah Renigunta Mohammed and Moulana Mohammed

The purpose of this study for eHealth text mining domains, cosine-based visual methods (VM) assess the clusters more accurately than Euclidean; which are recommended for tweet data

Abstract

Purpose

The purpose of this study for eHealth text mining domains, cosine-based visual methods (VM) assess the clusters more accurately than Euclidean; which are recommended for tweet data models for clusters assessment. Such VM determines the clusters concerning a single viewpoint or none, which are less informative. Multi-viewpoints (MVP) were used for addressing the more informative clusters assessment of health-care tweet documents and to demonstrate visual analysis of cluster tendency.

Design/methodology/approach

In this paper, the authors proposed MVP-based VM by using traditional topic models with visual techniques to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets. The authors demonstrated the effectiveness of proposed methods on different real-time Twitter health-care data sets in the experimental study. The authors also did a comparative analysis of proposed models with existing visual assessment tendency (VAT) and cVAT models by using cluster validity indices and computational complexities; the examples suggest that MVP VM were more informative.

Findings

In this paper, the authors proposed MVP-based VM by using traditional topic models with visual techniques to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets.

Originality/value

In this paper, the authors proposed multi-viewpoints distance metric in topic model cluster tendency for the first time and visual representation using VAT images using hybrid topic models to find cluster tendency, partitioning for cluster validity to propose health-care recommendations based on tweets.

Details

International Journal of Pervasive Computing and Communications, vol. 18 no. 1
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 10 August 2021

Elham Amirizadeh and Reza Boostani

The aim of this study is to propose a deep neural network (DNN) method that uses side information to improve clustering results for big datasets; also, the authors show that…

Abstract

Purpose

The aim of this study is to propose a deep neural network (DNN) method that uses side information to improve clustering results for big datasets; also, the authors show that applying this information improves the performance of clustering and also increase the speed of the network training convergence.

Design/methodology/approach

In data mining, semisupervised learning is an interesting approach because good performance can be achieved with a small subset of labeled data; one reason is that the data labeling is expensive, and semisupervised learning does not need all labels. One type of semisupervised learning is constrained clustering; this type of learning does not use class labels for clustering. Instead, it uses information of some pairs of instances (side information), and these instances maybe are in the same cluster (must-link [ML]) or in different clusters (cannot-link [CL]). Constrained clustering was studied extensively; however, little works have focused on constrained clustering for big datasets. In this paper, the authors have presented a constrained clustering for big datasets, and the method uses a DNN. The authors inject the constraints (ML and CL) to this DNN to promote the clustering performance and call it constrained deep embedded clustering (CDEC). In this manner, an autoencoder was implemented to elicit informative low dimensional features in the latent space and then retrain the encoder network using a proposed Kullback–Leibler divergence objective function, which captures the constraints in order to cluster the projected samples. The proposed CDEC has been compared with the adversarial autoencoder, constrained 1-spectral clustering and autoencoder + k-means was applied to the known MNIST, Reuters-10k and USPS datasets, and their performance were assessed in terms of clustering accuracy. Empirical results confirmed the statistical superiority of CDEC in terms of clustering accuracy to the counterparts.

Findings

First of all, this is the first DNN-constrained clustering that uses side information to improve the performance of clustering without using labels in big datasets with high dimension. Second, the author defined a formula to inject side information to the DNN. Third, the proposed method improves clustering performance and network convergence speed.

Originality/value

Little works have focused on constrained clustering for big datasets; also, the studies in DNNs for clustering, with specific loss function that simultaneously extract features and clustering the data, are rare. The method improves the performance of big data clustering without using labels, and it is important because the data labeling is expensive and time-consuming, especially for big datasets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 8 February 2016

Yoosin Kim, Rahul Dwivedi, Jie Zhang and Seung Ryul Jeong

The purpose of this paper is to mine competitive intelligence in social media to find the market insight by comparing consumer opinions and sales performance of a business and one…

5447

Abstract

Purpose

The purpose of this paper is to mine competitive intelligence in social media to find the market insight by comparing consumer opinions and sales performance of a business and one of its competitors by analyzing the public social media data.

Design/methodology/approach

An exploratory test using a multiple case study approach was used to compare two competing smartphone manufacturers. Opinion mining and sentiment analysis are conducted first, followed by further validation of results using statistical analysis. A total of 229,948 tweets mentioning the iPhone6 or the GalaxyS5 have been collected for four months following the release of the iPhone6; these have been analyzed using natural language processing, lexicon-based sentiment analysis, and purchase intention classification.

Findings

The analysis showed that social media data contain competitive intelligence. The volume of tweets revealed a significant gap between the market leader and one follower; the purchase intention data also reflected this gap, but to a less pronounced extent. In addition, the authors assessed whether social opinion could explain the sales performance gap between the competitors, and found that the social opinion gap was similar to the shipment gap.

Research limitations/implications

This study compared the social media opinion and the shipment gap between two rival smart phones. A business can take the consumers’ opinions toward not only its own product but also toward the product of competitors through social media analytics. Furthermore, the business can predict market sales performance and estimate the gap with competing products. As a result, decision makers can adjust the market strategy rapidly and compensate the weakness contrasting with the rivals as well.

Originality/value

This paper’s main contribution is to demonstrat the competitive intelligence via the consumer opinion mining of social media data. Researchers, business analysts, and practitioners can adopt this method of social media analysis to achieve their objectives and to implement practical procedures for data collection, spam elimination, machine learning classification, sentiment analysis, feature categorization, and result visualization.

Details

Online Information Review, vol. 40 no. 1
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 12 August 2021

Pooja Rani, Rajneesh Kumar and Anurag Jain

Decision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is…

Abstract

Purpose

Decision support systems developed using machine learning classifiers have become a valuable tool in predicting various diseases. However, the performance of these systems is adversely affected by the missing values in medical datasets. Imputation methods are used to predict these missing values. In this paper, a new imputation method called hybrid imputation optimized by the classifier (HIOC) is proposed to predict missing values efficiently.

Design/methodology/approach

The proposed HIOC is developed by using a classifier to combine multivariate imputation by chained equations (MICE), K nearest neighbor (KNN), mean and mode imputation methods in an optimum way. Performance of HIOC has been compared to MICE, KNN, and mean and mode methods. Four classifiers support vector machine (SVM), naive Bayes (NB), random forest (RF) and decision tree (DT) have been used to evaluate the performance of imputation methods.

Findings

The results show that HIOC performed efficiently even with a high rate of missing values. It had reduced root mean square error (RMSE) up to 17.32% in the heart disease dataset and 34.73% in the breast cancer dataset. Correct prediction of missing values improved the accuracy of the classifiers in predicting diseases. It increased classification accuracy up to 18.61% in the heart disease dataset and 6.20% in the breast cancer dataset.

Originality/value

The proposed HIOC is a new hybrid imputation method that can efficiently predict missing values in any medical dataset.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 20 August 2018

Gabriella Casalino, Ciro Castiello, Nicoletta Del Buono and Corrado Mencar

The purpose of this paper is to propose a framework for intelligent analysis of Twitter data. The purpose of the framework is to allow users to explore a collection of tweets by…

Abstract

Purpose

The purpose of this paper is to propose a framework for intelligent analysis of Twitter data. The purpose of the framework is to allow users to explore a collection of tweets by extracting topics with semantic relevance. In this way, it is possible to detect groups of tweets related to new technologies, events and other topics that are automatically discovered.

Design/methodology/approach

The framework is based on a three-stage process. The first stage is devoted to dataset creation by transforming a collection of tweets in a dataset according to the vector space model. The second stage, which is the core of the framework, is centered on the use of non-negative matrix factorizations (NMF) for extracting human-interpretable topics from tweets that are eventually clustered. The number of topics can be user-defined or can be discovered automatically by applying subtractive clustering as a preliminary step before factorization. Cluster analysis and word-cloud visualization are used in the last stage to enable intelligent data analysis.

Findings

The authors applied the framework to a case study of three collections of Italian tweets both with manual and automatic selection of the number of topics. Given the high sparsity of Twitter data, the authors also investigated the influence of different initializations mechanisms for NMF on the factorization results. Numerical comparisons confirm that NMF could be used for clustering as it is comparable to classical clustering techniques such as spherical k-means. Visual inspection of the word-clouds allowed a qualitative assessment of the results that confirmed the expected outcomes.

Originality/value

The proposed framework enables a collaborative approach between users and computers for an intelligent analysis of Twitter data. Users are faced with interpretable descriptions of tweet clusters, which can be interactively refined with few adjustable parameters. The resulting clusters can be used for intelligent selection of tweets, as well as for further analytics concerning the impact of products, events, etc. in the social network.

Details

International Journal of Web Information Systems, vol. 14 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 11 July 2017

Andrew Rogers, Kate L. Daunt, Peter Morgan and Malcolm Beynon

The theory of double jeopardy (DJ) is shown to hold across broad ranging geographies and physical product categories. However, there is very little research appertaining to the…

Abstract

Purpose

The theory of double jeopardy (DJ) is shown to hold across broad ranging geographies and physical product categories. However, there is very little research appertaining to the subject within an online environment. In particular, studies that investigate the presence of DJ and the contrasting view point to DJ, namely, that of negative double jeopardy (NDJ), are lacking. This study aims to contribute to this identified research gap and examines the presence of DJ and NDJ within a product category, utilising data from Twitter.

Design/methodology/approach

A total of 354,676 tweets are scraped from Twitter and their sentiment analysed and allocated into positive, negative and no-opinion clusters using fuzzy c-means clustering. The sentiment is then compared to the market share of brands within the beer product category to establish whether a DJ or NDJ effect is present.

Findings

Data reveal an NDJ effect with regards to original tweets (i.e. tweets which have not been retweeted). That is, when analysing tweets relating to brands within a defined beer category, the authors find that larger brands suffer by having an increased negativity amongst the larger proportion of tweets associated with them.

Research limitations/implications

The clustering approach to analyse sentiment in Twitter data brings a new direction to analysis of such sentiment. Future consideration of different numbers of clusters may further the insights this form of analysis can bring to the DJ/NDJ phenomenon. Managerial implications discuss the uncovered practitioner’s paradox of NDJ and strategies for dealing with DJ and NDJ effects.

Originality/value

This study is the first to explore the presence of DJ and NDJ through the utilisation of sentiment analysis-derived data and fuzzy clustering. DJ and NDJ are under-explored constructs in the online environment. Typically, past research examines DJ and NDJ in separate and detached fashions. Thus, the study is of theoretical value because it outlines boundaries to the DJ and NDJ conditions. Second, this research is the first study to analyse the sentiment of consumer-authored tweets to explore DJ and NDJ effects. Finally, the current study offers valuable insight into the DJ and NDJ effects for practicing marketing managers.

Details

European Journal of Marketing, vol. 51 no. 7/8
Type: Research Article
ISSN: 0309-0566

Keywords

Article
Publication date: 20 August 2018

Dharini Ramachandran and Parvathi Ramasubramanian

“What’s happening?” around you can be spread through the very pronounced social media to everybody. It provides a powerful platform that brings to light the latest news, trends…

Abstract

Purpose

“What’s happening?” around you can be spread through the very pronounced social media to everybody. It provides a powerful platform that brings to light the latest news, trends and happenings around the world in “near instant” time. Microblog is a popular Web service that enables users to post small pieces of digital content, such as text, picture, video and link to external resource. The raw data from microblog prove indispensable in extracting information from it, offering a way to single out the physical events and popular topics prevalent in social media. This study aims to present and review the varied methods carried out for event detection from microblogs. An event is an activity or action with a clear finite duration in which the target entity plays a key role. Event detection helps in the timely understanding of people’s opinion and actual condition of the detected events.

Design/methodology/approach

This paper presents a study of various approaches adopted for event detection from microblogs. The approaches are reviewed according to the techniques used, applications and the element detected (event or topic).

Findings

Various ideas explored, important observations inferred, corresponding outcomes and assessment of results from those approaches are discussed.

Originality/value

The approaches and techniques for event detection are studied in two categories: first, based on the kind of event being detected (physical occurrence or emerging/popular topic) and second, within each category, the approaches further categorized into supervised- and unsupervised-based techniques.

Article
Publication date: 20 April 2015

Takuya Sugitani, Masumi Shirakawa, Takahiro Hara and Shojiro Nishio

The purpose of this paper is to propose a method to detect local events in real time using Twitter, an online microblogging platform. The authors especially aim at detecting local…

Abstract

Purpose

The purpose of this paper is to propose a method to detect local events in real time using Twitter, an online microblogging platform. The authors especially aim at detecting local events regardless of the type and scale.

Design/methodology/approach

The method is based on the observation that relevant tweets (Twitter posts) are simultaneously posted from the place where a local event is happening. Specifically, the method first extracts the place where and the time when multiple tweets are posted using a hierarchical clustering technique. It next detects the co-occurrences of key terms in each spatiotemporal cluster to find local events. To determine key terms, it computes the term frequency-inverse document frequency (TFIDF) scores based on the spatiotemporal locality of tweets.

Findings

From the experimental results using geotagged tweet data between 9 a.m. and 3 p.m. on October 9, 2011, the method significantly improved the precision of between 50 and 100 per cent at the same recall compared to a baseline method.

Originality/value

In contrast to existing work, the method described in this paper can detect various types of small-scale local events as well as large-scale ones by incorporating the spatiotemporal feature of tweet postings and the text relevance of tweets. The findings will be useful to researchers who are interested in real-time event detection using microblogs.

Details

International Journal of Web Information Systems, vol. 11 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 7 October 2014

Bee Yee Liau and Pei Pei Tan

The purpose of this paper is to study the consumer opinion towards the low-cost airlines or low-cost carriers (LCCs) (these two terms are used interchangeably) industry in…

6960

Abstract

Purpose

The purpose of this paper is to study the consumer opinion towards the low-cost airlines or low-cost carriers (LCCs) (these two terms are used interchangeably) industry in Malaysia to better understand consumers’ needs and to provide better services. Sentiment analysis is undertaken in revealing current customers’ satisfaction level towards low-cost airlines.

Design/methodology/approach

About 10,895 tweets (data collected for two and a half months) are analysed. Text mining techniques are used during data pre-processing and a mixture of statistical techniques are used to segment the customers’ opinion.

Findings

The results with two different sentiment algorithms show that there is more positive than negative polarity across the different algorithms. Clustering results show that both K-Means and spherical K-Means algorithms delivered similar results and the four main topics that are discussed by the consumers on Twitter are customer service, LCCs tickets promotions, flight cancellations and delays and post-booking management.

Practical implications

Gaining knowledge of customer sentiments as well as improvements on the four main topics discussed in this study, i.e. customer service, LCCs tickets promotions, flight cancellations or delays and post-booking management will help LCCs to attract more customers and generate more profits.

Originality/value

This paper provides useful insights on customers’ sentiments and opinions towards LCCs by utilizing social media information.

Details

Industrial Management & Data Systems, vol. 114 no. 9
Type: Research Article
ISSN: 0263-5577

Keywords

1 – 10 of over 1000