Search results

1 – 10 of 208
Article
Publication date: 2 December 2019

Fuli Zhou, Ming K. Lim, Yandong He and Saurabh Pratap

The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the…

Abstract

Purpose

The increasingly booming e-commerce development has stimulated vehicle consumers to express individual reviews through online forum. The purpose of this paper is to probe into the vehicle consumer consumption behavior and make recommendations for potential consumers from textual comments viewpoint.

Design/methodology/approach

A big data analytic-based approach is designed to discover vehicle consumer consumption behavior from online perspective. To reduce subjectivity of expert-based approaches, a parallel Naïve Bayes approach is designed to analyze the sentiment analysis, and the Saaty scale-based (SSC) scoring rule is employed to obtain specific sentimental value of attribute class, contributing to the multi-grade sentiment classification. To achieve the intelligent recommendation for potential vehicle customers, a novel SSC-VIKOR approach is developed to prioritize vehicle brand candidates from a big data analytical viewpoint.

Findings

The big data analytics argue that “cost-effectiveness” characteristic is the most important factor that vehicle consumers care, and the data mining results enable automakers to better understand consumer consumption behavior.

Research limitations/implications

The case study illustrates the effectiveness of the integrated method, contributing to much more precise operations management on marketing strategy, quality improvement and intelligent recommendation.

Originality/value

Researches of consumer consumption behavior are usually based on survey-based methods, and mostly previous studies about comments analysis focus on binary analysis. The hybrid SSC-VIKOR approach is developed to fill the gap from the big data perspective.

Details

Industrial Management & Data Systems, vol. 120 no. 1
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 3 November 2020

Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu and Idowu Ademola Osinuga

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with…

Abstract

Purpose

Hate speech is an expression of intense hatred. Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors. Hate speech detection with social media data has witnessed special research attention in recent studies, hence, the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.

Design/methodology/approach

This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data. The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency (TF-IDF) for word-level feature extraction and Long Short Term Memory (LSTM) which is a variant of recurrent neural networks architecture for sentence-level feature extraction. The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech, offensive language or neither.

Findings

The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods. In order to validate the performances of the proposed method, t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection. Furthermore, Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.

Research limitations/implications

Finally, the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.

Originality/value

The main novelty of this study is the use of an automatic topic spotting measure based on naïve Bayes model to improve features representation.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 13 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Book part
Publication date: 30 September 2020

Hera Khan, Ayush Srivastav and Amit Kumar Mishra

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a…

Abstract

A detailed description will be provided of all the classification algorithms that have been widely used in the domain of medical science. The foundation will be laid by giving a comprehensive overview pertaining to the background and history of the classification algorithms. This will be followed by an extensive discussion regarding various techniques of classification algorithm in machine learning (ML) hence concluding with their relevant applications in data analysis in medical science and health care. To begin with, the initials of this chapter will deal with the basic fundamentals required for a profound understanding of the classification techniques in ML which will comprise of the underlying differences between Unsupervised and Supervised Learning followed by the basic terminologies of classification and its history. Further, it will include the types of classification algorithms ranging from linear classifiers like Logistic Regression, Naïve Bayes to Nearest Neighbour, Support Vector Machine, Tree-based Classifiers, and Neural Networks, and their respective mathematics. Ensemble algorithms such as Majority Voting, Boosting, Bagging, Stacking will also be discussed at great length along with their relevant applications. Furthermore, this chapter will also incorporate comprehensive elucidation regarding the areas of application of such classification algorithms in the field of biomedicine and health care and their contribution to decision-making systems and predictive analysis. To conclude, this chapter will devote highly in the field of research and development as it will provide a thorough insight to the classification algorithms and their relevant applications used in the cases of the healthcare development sector.

Details

Big Data Analytics and Intelligence: A Perspective for Health Care
Type: Book
ISBN: 978-1-83909-099-8

Keywords

Article
Publication date: 1 February 2016

Yuxian Eugene Liang and Soe-Tsyr Daphne Yuan

What makes investors tick? Largely counter-intuitive compared to the findings of most past research, this study explores the possibility that funding investors invest in companies…

3357

Abstract

Purpose

What makes investors tick? Largely counter-intuitive compared to the findings of most past research, this study explores the possibility that funding investors invest in companies based on social relationships, which could be positive or negative, similar or dissimilar. The purpose of this paper is to build a social network graph using data from CrunchBase, the largest public database with profiles about companies. The authors combine social network analysis with the study of investing behavior in order to explore how similarity between investors and companies affects investing behavior through social network analysis.

Design/methodology/approach

This study crawls and analyzes data from CrunchBase and builds a social network graph which includes people, companies, social links and funding investment links. The problem is then formalized as a link (or relationship) prediction task in a social network to model and predict (across various machine learning methods and evaluation metrics) whether an investor will create a link to a company in the social network. Various link prediction techniques such as common neighbors, shortest path, Jaccard Coefficient and others are integrated to provide a holistic view of a social network and provide useful insights as to how a pair of nodes may be related (i.e., whether the investor will invest in the particular company at a time) within the social network.

Findings

This study finds that funding investors are more likely to invest in a particular company if they have a stronger social relationship in terms of closeness, be it direct or indirect. At the same time, if investors and companies share too many common neighbors, investors are less likely to invest in such companies.

Originality/value

The author’s study is among the first to use data from the largest public company profile database of CrunchBase as a social network for research purposes. The author ' s also identify certain social relationship factors that can help prescribe the investor funding behavior. Authors prediction strategy based on these factors and modeling it as a link prediction problem generally works well across the most prominent learning algorithms and perform well in terms of aggregate performance as well as individual industries. In other words, this study would like to encourage companies to focus on social relationship factors in addition to other factors when seeking external funding investments.

Details

Internet Research, vol. 26 no. 1
Type: Research Article
ISSN: 1066-2243

Keywords

Open Access
Article
Publication date: 14 July 2022

Karlo Puh and Marina Bagić Babac

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism…

5942

Abstract

Purpose

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and popularity, the amount of significant data grows, too. On daily basis, millions of people write their opinions, suggestions and views about accommodation, services, and much more on various websites. Well-processed and filtered data can provide a lot of useful information that can be used for making tourists' experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the purpose of this study is to explore machine and deep learning models for predicting sentiment and rating from tourist reviews.

Design/methodology/approach

This paper used machine learning models such as Naïve Bayes, support vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews. These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five grades or stars. Data used for training the models were gathered from TripAdvisor, the world's largest travel platform. The models based on multinomial Naïve Bayes (MNB) and SVM were trained using the term frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were trained using global vectors (GloVe) for word representation. The results from testing these models are presented, compared and discussed.

Findings

The performance of machine and learning models achieved high accuracy in predicting positive, negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep learning models are more efficient and accurate than machine learning algorithms.

Practical implications

The proposed models allow for forecasting the number of tourist arrivals and expenditure, gaining insights into the tourists' profiles, improving overall customer experience, and upgrading marketing strategies. Different service sectors can use the implemented models to get insights into customer satisfaction with the products and services as well as to predict the opinions given a particular context.

Originality/value

This study developed and compared different machine learning models for classifying customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.

Details

Journal of Hospitality and Tourism Insights, vol. 6 no. 3
Type: Research Article
ISSN: 2514-9792

Keywords

Article
Publication date: 28 July 2020

Sathyaraj R, Ramanathan L, Lavanya K, Balasubramanian V and Saira Banu J

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of…

Abstract

Purpose

The innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.

Design/methodology/approach

The purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.

Findings

The maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.

Originality/value

In this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Details

Data Technologies and Applications, vol. 55 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Book part
Publication date: 4 December 2020

Gauri Rajendra Virkar and Supriya Sunil Shinde

Predictive analytics is the science of decision-making that eliminates guesswork out of the decision-making process and applies proven scientific procedures to find right…

Abstract

Predictive analytics is the science of decision-making that eliminates guesswork out of the decision-making process and applies proven scientific procedures to find right solutions. Predictive analytics provides ideas on the occurrences of future downtimes and rejections thereby aids in taking preventive actions before abnormalities occur. Considering these advantages, predictive analytics is adopted in various diverse fields such as health care, finance, education, marketing, automotive, etc. Predictive analytics tools can be used to predict various behaviors and patterns, thereby saving the time and money of its users. Many open-source predictive analysis tools namely R, scikit-learn, Konstanz Information Miner (KNIME), Orange, RapidMiner, Waikato Environment for Knowledge Analysis (WEKA), etc. are freely available for the users. This chapter aims to reveal the best accurate tools and techniques for the classification task that aid in decision-making. Our experimental results show that no specific tool provides the best results in all scenarios; rather it depends upon the datasets and the classifier.

Article
Publication date: 6 October 2023

Vahide Bulut

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification…

Abstract

Purpose

Feature extraction from 3D datasets is a current problem. Machine learning is an important tool for classification of complex 3D datasets. Machine learning classification techniques are widely used in various fields, such as text classification, pattern recognition, medical disease analysis, etc. The aim of this study is to apply the most popular classification and regression methods to determine the best classification and regression method based on the geodesics.

Design/methodology/approach

The feature vector is determined by the unit normal vector and the unit principal vector at each point of the 3D surface along with the point coordinates themselves. Moreover, different examples are compared according to the classification methods in terms of accuracy and the regression algorithms in terms of R-squared value.

Findings

Several surface examples are analyzed for the feature vector using classification (31 methods) and regression (23 methods) machine learning algorithms. In addition, two ensemble methods XGBoost and LightGBM are used for classification and regression. Also, the scores for each surface example are compared.

Originality/value

To the best of the author’s knowledge, this is the first study to analyze datasets based on geodesics using machine learning algorithms for classification and regression.

Details

Engineering Computations, vol. 40 no. 9/10
Type: Research Article
ISSN: 0264-4401

Keywords

Article
Publication date: 19 August 2021

Renuka Devi D. and Sasikala S.

The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to…

Abstract

Purpose

The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to society with inferred decisions at a correct time. The work is intended for streaming nature of Twitter data sets.

Design/methodology/approach

It is a demanding task to analyse the increasing Twitter data by the conventional methods. The MapReduce (MR) is used for quickest analytics. The online feature selection (OFS) accelerated bat algorithm (ABA) and ensemble incremental deep multiple layer perceptron (EIDMLP) classifier is proposed for Feature Selection and classification. Three Twitter data sets under varied categories are investigated (product, service and emotions). The proposed model is compared with Particle Swarm Optimization, Accelerated Particle Swarm Optimization, accelerated simulated annealing and mutation operator (ASAMO). Feature Selection algorithms and classifiers such as Naïve Bayes, support vector machine, Hoeffding tree and fuzzy minimal consistent class subset coverage with the k-nearest neighbour (FMCCSC-KNN).

Findings

The proposed model is compared with PSO, APSO, ASAMO. Feature Selection algorithms, and classifiers such as Naïve Bayes (NB), support vector machine (SVM), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage with the K-Nearest Neighbour (FMCCSC-KNN). The outcome of the work has achieved an accuracy of 99%, 99.48%, 98.9% for the given data sets with the processing time of 0.0034, 0.0024, 0.0053, seconds respectively.

Originality/value

A novel framework is proposed for Feature Selection and classification. The work is compared with the authors’ previously developed classifiers with other state-of-the-art Feature Selection and classification algorithms.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 16 August 2021

Nur Azreen Zulkefly, Norjihan Abdul Ghani, Christie Pei-Yee Chin, Suraya Hamid and Nor Aniza Abdullah

Predicting the impact of social entrepreneurship is crucial as it can help social entrepreneurs to determine the achievement of their social mission and performance. However…

1054

Abstract

Purpose

Predicting the impact of social entrepreneurship is crucial as it can help social entrepreneurs to determine the achievement of their social mission and performance. However, there is a lack of existing social entrepreneurship models to predict social enterprises' social impacts. This paper aims to propose the social impact prediction model for social entrepreneurs using a data analytic approach.

Design/methodology/approach

This study implemented an experimental method using three different algorithms: naive Bayes, k-nearest neighbor and J48 decision tree algorithms to develop and test the social impact prediction model.

Findings

The accurate result of the developed social impact prediction model is based on the list of identified social impact prediction variables that have been evaluated by social entrepreneurship experts. Based on the three algorithms' implementation of the model, the results showed that naive Bayes is the best performance classifier for social impact prediction accuracy.

Research limitations/implications

Although there are three categories of social entrepreneurship impact, this research only focuses on social impact. There will be a bright future of social entrepreneurship if the research can focus on all three social entrepreneurship categories. Future research in this area could look beyond these three categories of social entrepreneurship, so the prediction of social impact will be broader. The prospective researcher also can look beyond the difference and similarities of economic, social impacts and environmental impacts and study the overall perspective on those impacts.

Originality/value

This paper fulfills the need for the Malaysian social entrepreneurship blueprint to design the social impact in social entrepreneurship. There are none of the prediction models that can be used in predicting social impact in Malaysia. This study also contributes to social entrepreneur researchers, as the new social impact prediction variables found can be used in predicting social impact in social entrepreneurship in the future, which may lead to the significance of the prediction performance.

Details

Internet Research, vol. 32 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

1 – 10 of 208