Search results
1 – 10 of over 3000Samira Khodabandehlou and Mahmoud Zivari Rahman
This paper aims to provide a predictive framework of customer churn through six stages for accurate prediction and preventing customer churn in the field of business.
Abstract
Purpose
This paper aims to provide a predictive framework of customer churn through six stages for accurate prediction and preventing customer churn in the field of business.
Design/methodology/approach
The six stages are as follows: first, collection of customer behavioral data and preparation of the data; second, the formation of derived variables and selection of influential variables, using a method of discriminant analysis; third, selection of training and testing data and reviewing their proportion; fourth, the development of prediction models using simple, bagging and boosting versions of supervised machine learning; fifth, comparison of churn prediction models based on different versions of machine-learning methods and selected variables; and sixth, providing appropriate strategies based on the proposed model.
Findings
According to the results, five variables, the number of items, reception of returned items, the discount, the distribution time and the prize beside the recency, frequency and monetary (RFM) variables (RFMITSDP), were chosen as the best predictor variables. The proposed model with accuracy of 97.92 per cent, in comparison to RFM, had much better performance in churn prediction and among the supervised machine learning methods, artificial neural network (ANN) had the highest accuracy, and decision trees (DT) was the least accurate one. The results show the substantially superiority of boosting versions in prediction compared with simple and bagging models.
Research limitations/implications
The period of the available data was limited to two years. The research data were limited to only one grocery store whereby it may not be applicable to other industries; therefore, generalizing the results to other business centers should be used with caution.
Practical implications
Business owners must try to enforce a clear rule to provide a prize for a certain number of purchased items. Of course, the prize can be something other than the purchased item. Business owners must accept the items returned by the customers for any reasons, and the conditions for accepting returned items and the deadline for accepting the returned items must be clearly communicated to the customers. Store owners must consider a discount for a certain amount of purchase from the store. They have to use an exponential rule to increase the discount when the amount of purchase is increased to encourage customers for more purchase. The managers of large stores must try to quickly deliver the ordered items, and they should use equipped and new transporting vehicles and skilled and friendly workforce for delivering the items. It is recommended that the types of services, the rules for prizes, the discount, the rules for accepting the returned items and the method of distributing the items must be prepared and shown in the store for all the customers to see. The special services and reward rules of the store must be communicated to the customers using new media such as social networks. To predict the customer behaviors based on the data, the future researchers should use the boosting method because it increases efficiency and accuracy of prediction. It is recommended that for predicting the customer behaviors, particularly their churning status, the ANN method be used. To extract and select the important and effective variables influencing customer behaviors, the discriminant analysis method can be used which is a very accurate and powerful method for predicting the classes of the customers.
Originality/value
The current study tries to fill this gap by considering five basic and important variables besides RFM in stores, i.e. prize, discount, accepting returns, delay in distribution and the number of items, so that the business owners can understand the role services such as prizes, discount, distribution and accepting returns play in retraining the customers and preventing them from churning. Another innovation of the current study is the comparison of machine-learning methods with their boosting and bagging versions, especially considering the fact that previous studies do not consider the bagging method. The other reason for the study is the conflicting results regarding the superiority of machine-learning methods in a more accurate prediction of customer behaviors, including churning. For example, some studies introduce ANN (Huang et al., 2010; Hung and Wang, 2004; Keramati et al., 2014; Runge et al., 2014), some introduce support vector machine ( Guo-en and Wei-dong, 2008; Vafeiadis et al., 2015; Yu et al., 2011) and some introduce DT (Freund and Schapire, 1996; Qureshi et al., 2013; Umayaparvathi and Iyakutti, 2012) as the best predictor, confusing the users of the results of these studies regarding the best prediction method. The current study identifies the best prediction method specifically in the field of store businesses for researchers and the owners. Moreover, another innovation of the current study is using discriminant analysis for selecting and filtering variables which are important and effective in predicting churners and non-churners, which is not used in previous studies. Therefore, the current study is unique considering the used variables, the method of comparing their accuracy and the method of selecting effective variables.
Details
Keywords
Nasim Eslamirad, Soheil Malekpour Kolbadinejad, Mohammadjavad Mahdavinejad and Mohammad Mehranrad
This research aims to introduce a new methodology for integration between urban design strategies and supervised machine learning (SML) method – by applying both energy…
Abstract
Purpose
This research aims to introduce a new methodology for integration between urban design strategies and supervised machine learning (SML) method – by applying both energy engineering modeling (evaluating phase) for the existing green sidewalks and statistical energy modeling (predicting phase) for the new ones – to offer algorithms that help to catch the optimum morphology of green sidewalks, in case of high quality of the outdoor thermal comfort and less errors in results.
Design/methodology/approach
The tools of the study are the way of processing by SML, predicting the future based on the past. Machine learning is benefited from Python advantages. The structure of the study consisted of two main parts, as the majority of the similar studies follow: engineering energy modeling and statistical energy modeling. According to the concept of the study, at first, from 2268 models, some are randomly selected, simulated and sensitively analyzed by ENVI-met. Furthermore, the Envi-met output as the quantity of thermal comfort – predicted mean vote (PMV) and weather items are inputs of Python. Then, the formed data set is processed by SML, to reach the final reliable predicted output.
Findings
The process of SML leads the study to find thermal comfort of current models and other similar sidewalks. The results are evaluated by both PMV mathematical model and SML error evaluation functions. The results confirm that the average of the occurred error is about 1%. Then the method of study is reliable to apply in the variety of similar fields. Finding of this study can be helpful in perspective of the sustainable architecture strategies in the buildings and urban scales, to determine, monitor and control energy-based behaviors (thermal comfort, heating, cooling, lighting and ventilation) in operational phase of the systems (existed elements in buildings, and constructions) and the planning and designing phase of the future built cases – all over their life spans.
Research limitations/implications
Limitations of the study are related to the study variables and alternatives that are notable impact on the findings. Furthermore, the most trustable input data will result in the more accuracy in output. Then modeling and simulation processes are most significant part of the research to reach the exact results in the final step.
Practical implications
Finding of the study can be helpful in urban design strategies. By finding outdoor thermal comfort that resulted from machine learning method, urban and landscape designers, policymakers and architects are able to estimate the features of their designs in air quality and urban health and can be sure in catching design goals in case of thermal comfort in urban atmosphere.
Social implications
By 2030, cities are delved as living spaces for about three out of five people. As green infrastructures influence in moderating the cities’ climate, the relationship between green spaces and habitants’ thermal comfort is deduced. Although the strategies to outside thermal comfort improvement, by design methods and applicants, are not new subject to discuss, applying machines that may be common in predicting results can be called as a new insight in applying more effective design strategies and in urban environment’s comfort preparation. Then study’s footprint in social implications stems in learning from the previous projects and developing more efficient strategies to prepare cities as the more comfortable and healthy places to live, with the more efficient models and consuming money and time.
Originality/value
The study achievements are expected to be applied not only in Tehran but also in other climate zones as the pattern in more eco-city design strategies. Although some similar studies are done in different majors, the concept of study is new vision in urban studies.
Details
Keywords
Yoon-Sung Kim, Hae-Chang Rim and Do-Gil Lee
The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.
Abstract
Purpose
The purpose of this paper is to propose a methodology to analyze a large amount of unstructured textual data into categories of business environmental analysis frameworks.
Design/methodology/approach
This paper uses machine learning to classify a vast amount of unstructured textual data by category of business environmental analysis framework. Generally, it is difficult to produce high quality and massive training data for machine-learning-based system in terms of cost. Semi-supervised learning techniques are used to improve the classification performance. Additionally, the lack of feature problem that traditional classification systems have suffered is resolved by applying semantic features by utilizing word embedding, a new technique in text mining.
Findings
The proposed methodology can be used for various business environmental analyses and the system is fully automated in both the training and classifying phases. Semi-supervised learning can solve the problems with insufficient training data. The proposed semantic features can be helpful for improving traditional classification systems.
Research limitations/implications
This paper focuses on classifying sentences that contain the information of business environmental analysis in large amount of documents. However, the proposed methodology has a limitation on the advanced analyses which can directly help managers establish strategies, since it does not summarize the environmental variables that are implied in the classified sentences. Using the advanced summarization and recommendation techniques could extract the environmental variables among the sentences, and they can assist managers to establish effective strategies.
Originality/value
The feature selection technique developed in this paper has not been used in traditional systems for business and industry, so that the whole process can be fully automated. It also demonstrates practicality so that it can be applied to various business environmental analysis frameworks. In addition, the system is more economical than traditional systems because of semi-supervised learning, and can resolve the lack of feature problem that traditional systems suffer. This work is valuable for analyzing environmental factors and establishing strategies for companies.
Details
Keywords
Harleen Kaur and Vinita Kumari
Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy…
Abstract
Diabetes is a major metabolic disorder which can affect entire body system adversely. Undiagnosed diabetes can increase the risk of cardiac stroke, diabetic nephropathy and other disorders. All over the world millions of people are affected by this disease. Early detection of diabetes is very important to maintain a healthy life. This disease is a reason of global concern as the cases of diabetes are rising rapidly. Machine learning (ML) is a computational method for automatic learning from experience and improves the performance to make more accurate predictions. In the current research we have utilized machine learning technique in Pima Indian diabetes dataset to develop trends and detect patterns with risk factors using R data manipulation tool. To classify the patients into diabetic and non-diabetic we have developed and analyzed five different predictive models using R data manipulation tool. For this purpose we used supervised machine learning algorithms namely linear kernel support vector machine (SVM-linear), radial basis function (RBF) kernel support vector machine, k-nearest neighbour (k-NN), artificial neural network (ANN) and multifactor dimensionality reduction (MDR).
Details
Keywords
Francisco Villarroel Ordenes and Shunyuan Zhang
The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and…
Abstract
Purpose
The purpose of this paper is to describe and position the state-of-the-art of text and image mining methods in business research. By providing a detailed conceptual and technical review of both methods, it aims to increase their utilization in service research.
Design/methodology/approach
On a first stage, the authors review business literature in marketing, operations and management concerning the use of text and image mining methods. On a second stage, the authors identify and analyze empirical papers that used text and image mining methods in services journals and premier business. Finally, avenues for further research in services are provided.
Findings
The manuscript identifies seven text mining methods and describes their approaches, processes, techniques and algorithms, involved in their implementation. Four of these methods are positioned similarly for image mining. There are 39 papers using text mining in service research, with a focus on measuring consumer sentiment, experiences, and service quality. Due to the nonexistent use of image mining service journals, the authors review their application in marketing and management, and suggest ideas for further research in services.
Research limitations/implications
This manuscript focuses on the different methods and their implementation in service research, but it does not offer a complete review of business literature using text and image mining methods.
Practical implications
The results have a number of implications for the discipline that are presented and discussed. The authors provide research directions using text and image mining methods in service priority areas such as artificial intelligence, frontline employees, transformative consumer research and customer experience.
Originality/value
The manuscript provides an introduction to text and image mining methods to service researchers and practitioners interested in the analysis of unstructured data. This paper provides several suggestions concerning the use of new sources of data (e.g. customer reviews, social media images, employee reviews and emails), measurement of new constructs (beyond sentiment and valence) and the use of more recent methods (e.g. deep learning).
Details
Keywords
With the advent of Big Data, the ability to store and use the unprecedented amount of clinical information is now feasible via Electronic Health Records (EHRs). The…
Abstract
With the advent of Big Data, the ability to store and use the unprecedented amount of clinical information is now feasible via Electronic Health Records (EHRs). The massive collection of clinical data by health care systems and treatment canters can be productively used to perform predictive analytics on treatment plans to improve patient health outcomes. These massive data sets have stimulated opportunities to adapt computational algorithms to track and identify target areas for quality improvement in health care.
According to a report from Association of American Medical Colleges, there will be an alarming gap between demand and supply of health care work force in near future. The projections show that, by 2032 there is will be a shortfall of between 46,900 and 121,900 physicians in US (AAMC, 2019). Therefore, early prediction of health care risks is a demanding requirement to improve health care quality and reduce health care costs. Predictive analytics uses historical data and algorithms based on either statistics or machine learning to develop predictive models that capture important trends. These models have the ability to predict the likelihood of the future events. Predictive models developed using supervised machine learning approaches are commonly applied for various health care problems such as disease diagnosis, treatment selection, and treatment personalization.
This chapter provides an overview of various machine learning and statistical techniques for developing predictive models. Case examples from the extant literature are provided to illustrate the role of predictive modeling in health care research. Together with adaptation of these predictive modeling techniques with Big Data analytics underscores the need for standardization and transparency while recognizing the opportunities and challenges ahead.
Details
Keywords
Ammara Zamir, Hikmat Ullah Khan, Waqar Mehmood, Tassawar Iqbal and Abubakker Usman Akram
This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of…
Abstract
Purpose
This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this study is to exploit the role of sentiment features along with other proposed features to evaluate the classification accuracy of machine learning algorithms for spam email detection.
Design/methodology/approach
Existing studies primarily exploits content-based feature engineering approach; however, a limited number of features is considered. In this regard, this research study proposed a feature-centric framework (FSEDM) based on existing and novel features of email data set, which are extracted after pre-processing. Afterwards, diverse supervised learning techniques are applied on the proposed features in conjunction with feature selection techniques such as information gain, gain ratio and Relief-F to rank most prominent features and classify the emails into spam or ham (not spam).
Findings
Analysis and experimental results indicated that the proposed model with sentiment analysis is competitive approach for spam email detection. Using the proposed model, deep neural network applied with sentiment features outperformed other classifiers in terms of classification accuracy up to 97.2%.
Originality/value
This research is novel in this regard that no previous research focuses on sentiment analysis in conjunction with other email features for detection of spam emails.
Details
Keywords
Zhuoxuan Jiang, Chunyan Miao and Xiaoming Li
Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated…
Abstract
Purpose
Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well.
Design/methodology/approach
Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels.
Findings
From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform.
Research limitations/implications
Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required.
Practical implications
The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved.
Originality/value
This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.
Details
Keywords
This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly…
Abstract
Purpose
This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal and sensitivity reviews, and propose functional requirements and workflow considerations for transitioning from experimental to operational use of these tools.
Design/methodology/approach
The paper has four main sections. 1) A short overview of the NLP and machine learning concepts referenced in the paper. 2) A review of the literature reporting on NLP and machine learning applied to archival processes. 3) An overview and commentary on key existing and developing tools that use NLP or machine learning techniques for archives. 4) This review and analysis will inform a discussion of functional requirements and workflow considerations for NLP and machine learning tools for archival processing.
Findings
Applications for processing e-mail have received the most attention so far, although most initiatives have been experimental or project based. It now seems feasible to branch out to develop more generalized tools for born-digital, unstructured records. Effective NLP and machine learning tools for archival processing should be usable, interoperable, flexible, iterative and configurable.
Originality/value
Most implementations of NLP for archives have been experimental or project based. The main exception that has moved into production is ePADD, which includes robust NLP features through its named entity recognition module. This paper takes a broader view, assessing the prospects and possible directions for integrating NLP tools and techniques into archival workflows.
Details
Keywords
D. K. Malhotra, Kunal Malhotra and Rashmi Malhotra
Traditionally, loan officers use different credit scoring models to complement judgmental methods to classify consumer loan applications. This study explores the use of…
Abstract
Traditionally, loan officers use different credit scoring models to complement judgmental methods to classify consumer loan applications. This study explores the use of decision trees, AdaBoost, and support vector machines (SVMs) to identify potential bad loans. Our results show that AdaBoost does provide an improvement over simple decision trees as well as SVM models in predicting good credit clients and bad credit clients. To cross-validate our results, we use k-fold classification methodology.
Details