Search results
1 – 10 of over 35000With the advent of Big Data, the ability to store and use the unprecedented amount of clinical information is now feasible via Electronic Health Records (EHRs). The massive…
Abstract
With the advent of Big Data, the ability to store and use the unprecedented amount of clinical information is now feasible via Electronic Health Records (EHRs). The massive collection of clinical data by health care systems and treatment canters can be productively used to perform predictive analytics on treatment plans to improve patient health outcomes. These massive data sets have stimulated opportunities to adapt computational algorithms to track and identify target areas for quality improvement in health care.
According to a report from Association of American Medical Colleges, there will be an alarming gap between demand and supply of health care work force in near future. The projections show that, by 2032 there is will be a shortfall of between 46,900 and 121,900 physicians in US (AAMC, 2019). Therefore, early prediction of health care risks is a demanding requirement to improve health care quality and reduce health care costs. Predictive analytics uses historical data and algorithms based on either statistics or machine learning to develop predictive models that capture important trends. These models have the ability to predict the likelihood of the future events. Predictive models developed using supervised machine learning approaches are commonly applied for various health care problems such as disease diagnosis, treatment selection, and treatment personalization.
This chapter provides an overview of various machine learning and statistical techniques for developing predictive models. Case examples from the extant literature are provided to illustrate the role of predictive modeling in health care research. Together with adaptation of these predictive modeling techniques with Big Data analytics underscores the need for standardization and transparency while recognizing the opportunities and challenges ahead.
Details
Keywords
Pratyush N. Sharma, Benjamin D. Liengaard, Joseph F. Hair, Marko Sarstedt and Christian M. Ringle
Researchers often stress the predictive goals of their partial least squares structural equation modeling (PLS-SEM) analyses. However, the method has long lacked a statistical…
Abstract
Purpose
Researchers often stress the predictive goals of their partial least squares structural equation modeling (PLS-SEM) analyses. However, the method has long lacked a statistical test to compare different models in terms of their predictive accuracy and to establish whether a proposed model offers a significantly better out-of-sample predictive accuracy than a naïve benchmark. This paper aims to address this methodological research gap in predictive model assessment and selection in composite-based modeling.
Design/methodology/approach
Recent research has proposed the cross-validated predictive ability test (CVPAT) to compare theoretically established models. This paper proposes several extensions that broaden the scope of CVPAT and explains the key choices researchers must make when using them. A popular marketing model is used to illustrate the CVPAT extensions’ use and to make recommendations for the interpretation and benchmarking of the results.
Findings
This research asserts that prediction-oriented model assessments and comparisons are essential for theory development and validation. It recommends that researchers routinely consider the application of CVPAT and its extensions when analyzing their theoretical models.
Research limitations/implications
The findings offer several avenues for future research to extend and strengthen prediction-oriented model assessment and comparison in PLS-SEM.
Practical implications
Guidelines are provided for applying CVPAT extensions and reporting the results to help researchers substantiate their models’ predictive capabilities.
Originality/value
This research contributes to strengthening the predictive model validation practice in PLS-SEM, which is essential to derive managerial implications that are typically predictive in nature.
Details
Keywords
Wynne Chin, Jun-Hwa Cheah, Yide Liu, Hiram Ting, Xin-Jean Lim and Tat Huei Cham
Partial least squares structural equation modeling (PLS-SEM) has become popular in the information systems (IS) field for modeling structural relationships between latent…
Abstract
Purpose
Partial least squares structural equation modeling (PLS-SEM) has become popular in the information systems (IS) field for modeling structural relationships between latent variables as measured by manifest variables. However, while researchers using PLS-SEM routinely stress the causal-predictive nature of their analyses, the model evaluation assessment relies exclusively on criteria designed to assess the path model's explanatory power. To take full advantage of the purpose of causal prediction in PLS-SEM, it is imperative for researchers to comprehend the efficacy of various quality criteria, such as traditional PLS-SEM criteria, model fit, PLSpredict, cross-validated predictive ability test (CVPAT) and model selection criteria.
Design/methodology/approach
A systematic review was conducted to understand empirical studies employing the use of the causal prediction criteria available for PLS-SEM in the database of Industrial Management and Data Systems (IMDS) and Management Information Systems Quarterly (MISQ). Furthermore, this study discusses the details of each of the procedures for the causal prediction criteria available for PLS-SEM, as well as how these criteria should be interpreted. While the focus of the paper is on demystifying the role of causal prediction modeling in PLS-SEM, the overarching aim is to compare the performance of different quality criteria and to select the appropriate causal-predictive model from a cohort of competing models in the IS field.
Findings
The study found that the traditional PLS-SEM criteria (goodness of fit (GoF) by Tenenhaus, R2 and Q2) and model fit have difficulty determining the appropriate causal-predictive model. In contrast, PLSpredict, CVPAT and model selection criteria (i.e. Bayesian information criterion (BIC), BIC weight, Geweke–Meese criterion (GM), GM weight, HQ and HQC) were found to outperform the traditional criteria in determining the appropriate causal-predictive model, because these criteria provided both in-sample and out-of-sample predictions in PLS-SEM.
Originality/value
This research substantiates the use of the PLSpredict, CVPAT and the model selection criteria (i.e. BIC, BIC weight, GM, GM weight, HQ and HQC). It provides IS researchers and practitioners with the knowledge they need to properly assess, report on and interpret PLS-SEM results when the goal is only causal prediction, thereby contributing to safeguarding the goal of using PLS-SEM in IS studies.
Details
Keywords
Indranil Ghosh, Rabin K. Jana and Mohammad Zoynul Abedin
The prediction of Airbnb listing prices predominantly uses a set of amenity-driven features. Choosing an appropriate set of features from thousands of available amenity-driven…
Abstract
Purpose
The prediction of Airbnb listing prices predominantly uses a set of amenity-driven features. Choosing an appropriate set of features from thousands of available amenity-driven features makes the prediction task difficult. This paper aims to propose a scalable, robust framework to predict listing prices of Airbnb units without using amenity-driven features.
Design/methodology/approach
The authors propose an artificial intelligence (AI)-based framework to predict Airbnb listing prices. The authors consider 75 thousand Airbnb listings from the five US cities with more than 1.9 million observations. The proposed framework integrates (i) feature screening, (ii) stacking that combines gradient boosting, bagging, random forest, (iii) particle swarm optimization and (iv) explainable AI to accomplish the research objective.
Findings
The key findings have three aspects – prediction accuracy, homogeneity and identification of best and least predictable cities. The proposed framework yields predictions of supreme precision. The predictability of listing prices varies significantly across cities. The listing prices are the best predictable for Boston and the least predictable for Chicago.
Practical implications
The framework and findings of the research can be leveraged by the hosts to determine rental prices and augment the service offerings by emphasizing key features, respectively.
Originality/value
Although individual components are known, the way they have been integrated into the proposed framework to derive a high-quality forecast of Airbnb listing prices is unique. It is scalable. The Airbnb listing price modeling literature rarely witnesses such a framework.
Details
Keywords
Christian Nnaemeka Egwim, Hafiz Alaka, Youlu Pan, Habeeb Balogun, Saheed Ajayi, Abdul Hye and Oluwapelumi Oluwaseun Egunjobi
The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning…
Abstract
Purpose
The study aims to develop a multilayer high-effective ensemble of ensembles predictive model (stacking ensemble) using several hyperparameter optimized ensemble machine learning (ML) methods (bagging and boosting ensembles) trained with high-volume data points retrieved from Internet of Things (IoT) emission sensors, time-corresponding meteorology and traffic data.
Design/methodology/approach
For a start, the study experimented big data hypothesis theory by developing sample ensemble predictive models on different data sample sizes and compared their results. Second, it developed a standalone model and several bagging and boosting ensemble models and compared their results. Finally, it used the best performing bagging and boosting predictive models as input estimators to develop a novel multilayer high-effective stacking ensemble predictive model.
Findings
Results proved data size to be one of the main determinants to ensemble ML predictive power. Second, it proved that, as compared to using a single algorithm, the cumulative result from ensemble ML algorithms is usually always better in terms of predicted accuracy. Finally, it proved stacking ensemble to be a better model for predicting PM2.5 concentration level than bagging and boosting ensemble models.
Research limitations/implications
A limitation of this study is the trade-off between performance of this novel model and the computational time required to train it. Whether this gap can be closed remains an open research question. As a result, future research should attempt to close this gap. Also, future studies can integrate this novel model to a personal air quality messaging system to inform public of pollution levels and improve public access to air quality forecast.
Practical implications
The outcome of this study will aid the public to proactively identify highly polluted areas thus potentially reducing pollution-associated/ triggered COVID-19 (and other lung diseases) deaths/ complications/ transmission by encouraging avoidance behavior and support informed decision to lock down by government bodies when integrated into an air pollution monitoring system
Originality/value
This study fills a gap in literature by providing a justification for selecting appropriate ensemble ML algorithms for PM2.5 concentration level predictive modeling. Second, it contributes to the big data hypothesis theory, which suggests that data size is one of the most important factors of ML predictive capability. Third, it supports the premise that when using ensemble ML algorithms, the cumulative output is usually always better in terms of predicted accuracy than using a single algorithm. Finally developing a novel multilayer high-performant hyperparameter optimized ensemble of ensembles predictive model that can accurately predict PM2.5 concentration levels with improved model interpretability and enhanced generalizability, as well as the provision of a novel databank of historic pollution data from IoT emission sensors that can be purchased for research, consultancy and policymaking.
Details
Keywords
Serhat Peker, Altan Kocyigit and P. Erhan Eren
Predicting customers’ purchase behaviors is a challenging task. The literature has introduced the individual-level and the segment-based predictive modeling approaches for this…
Abstract
Purpose
Predicting customers’ purchase behaviors is a challenging task. The literature has introduced the individual-level and the segment-based predictive modeling approaches for this purpose. Each method has its own advantages and drawbacks, and performs in certain cases. The purpose of this paper is to propose a hybrid approach which predicts customers’ individual purchase behaviors and reduces the limitations of these two methods by combining the advantages of them.
Design/methodology/approach
The proposed hybrid approach is established based on individual-level and segment-based approaches and utilizes the historical transactional data and predictive algorithms to generate predictions. The effectiveness of the proposed approach is experimentally evaluated in the domain of supermarket shopping by using real-world data and using five popular machine learning classification algorithms including logistic regression, decision trees, support vector machines, neural networks and random forests.
Findings
A comparison of results shows that the proposed hybrid approach substantially outperforms the individual-level and the segment-based approaches in terms of prediction coverage while maintaining roughly comparable prediction accuracy to the individual-level method. Moreover, the experimental results demonstrate that logistic regression performs better than the other classifiers in predicting customer purchase behavior.
Practical implications
The study concludes that the proposed approach would be beneficial for enterprises in terms of designing customized services and one-to-one marketing strategies.
Originality/value
This study is the first attempt to adopt a hybrid approach combining individual-level and segment-based approaches to predict customers’ individual purchase behaviors.
Details
Keywords
Gyeongcheol Cho, Sunmee Kim, Jonathan Lee, Heungsun Hwang, Marko Sarstedt and Christian M. Ringle
Generalized structured component analysis (GSCA) and partial least squares path modeling (PLSPM) are two key component-based approaches to structural equation modeling that…
Abstract
Purpose
Generalized structured component analysis (GSCA) and partial least squares path modeling (PLSPM) are two key component-based approaches to structural equation modeling that facilitate the analysis of theoretically established models in terms of both explanation and prediction. This study aims to offer a comparative evaluation of GSCA and PLSPM in a predictive modeling framework.
Design/methodology/approach
A simulation study compares the predictive performance of GSCA and PLSPM under various simulation conditions and different prediction types of correctly specified and misspecified models.
Findings
The results suggest that GSCA with reflective composite indicators (GSCAR) is the most versatile approach. For observed prediction, which uses the component scores to generate prediction for the indicators, GSCAR performs slightly better than PLSPM with mode A. For operative prediction, which considers all parameter estimates to generate predictions, both methods perform equally well. GSCA with formative composite indicators and PLSPM with mode B generally lag behind the other methods.
Research limitations/implications
Future research may further assess the methods’ prediction precision, considering more experimental factors with a wider range of levels, including more extreme ones.
Practical implications
When prediction is the primary study aim, researchers should generally revert to GSCAR, considering its performance for observed and operative prediction together.
Originality/value
This research is the first to compare the relative efficacy of GSCA and PLSPM in terms of predictive power.
Details
Keywords
Abstract
Purpose
Many higher education institutions are investigating the possibility of developing predictive student success models that use different sources of data available to identify students that might be at risk of failing a course or program. The purpose of this paper is to review the methodological components related to the predictive models that have been developed or currently implemented in learning analytics applications in higher education.
Design/methodology/approach
Literature review was completed in three stages. First, the authors conducted searches and collected related full-text documents using various search terms and keywords. Second, they developed inclusion and exclusion criteria to identify the most relevant citations for the purpose of the current review. Third, they reviewed each document from the final compiled bibliography and focused on identifying information that was needed to answer the research questions
Findings
In this review, the authors identify methodological strengths and weaknesses of current predictive learning analytics applications and provide the most up-to-date recommendations on predictive model development, use and evaluation. The review results can inform important future areas of research that could strengthen the development of predictive learning analytics for the purpose of generating valuable feedback to students to help them succeed in higher education.
Originality/value
This review provides an overview of the methodological considerations for researchers and practitioners who are planning to develop or currently in the process of developing predictive student success models in the context of higher education.
Details
Keywords
Indranil Ghosh, Rabin K. Jana and Dinesh K. Sharma
Owing to highly volatile and chaotic external events, predicting future movements of cryptocurrencies is a challenging task. This paper advances a granular hybrid predictive…
Abstract
Purpose
Owing to highly volatile and chaotic external events, predicting future movements of cryptocurrencies is a challenging task. This paper advances a granular hybrid predictive modeling framework for predicting the future figures of Bitcoin (BTC), Litecoin (LTC), Ethereum (ETH), Stellar (XLM) and Tether (USDT) during normal and pandemic regimes.
Design/methodology/approach
Initially, the major temporal characteristics of the price series are examined. In the second stage, ensemble empirical mode decomposition (EEMD) and maximal overlap discrete wavelet transformation (MODWT) are used to decompose the original time series into two distinct sets of granular subseries. In the third stage, long- and short-term memory network (LSTM) and extreme gradient boosting (XGB) are applied to the decomposed subseries to estimate the initial forecasts. Lastly, sequential quadratic programming (SQP) is used to fetch the forecast by combining the initial forecasts.
Findings
Rigorous performance assessment and the outcome of the Diebold-Mariano’s pairwise statistical test demonstrate the efficacy of the suggested predictive framework. The framework yields commendable predictive performance during the COVID-19 pandemic timeline explicitly as well. Future trends of BTC and ETH are found to be relatively easier to predict, while USDT is relatively difficult to predict.
Originality/value
The robustness of the proposed framework can be leveraged for practical trading and managing investment in crypto market. Empirical properties of the temporal dynamics of chosen cryptocurrencies provide deeper insights.
Details
Keywords
Florian Schuberth, Manuel E. Rademaker and Jörg Henseler
This study aims to examine the role of an overall model fit assessment in the context of partial least squares path modeling (PLS-PM). In doing so, it will explain when it is…
Abstract
Purpose
This study aims to examine the role of an overall model fit assessment in the context of partial least squares path modeling (PLS-PM). In doing so, it will explain when it is important to assess the overall model fit and provides ways of assessing the fit of composite models. Moreover, it will resolve major concerns about model fit assessment that have been raised in the literature on PLS-PM.
Design/methodology/approach
This paper explains when and how to assess the fit of PLS path models. Furthermore, it discusses the concerns raised in the PLS-PM literature about the overall model fit assessment and provides concise guidelines on assessing the overall fit of composite models.
Findings
This study explains that the model fit assessment is as important for composite models as it is for common factor models. To assess the overall fit of composite models, researchers can use a statistical test and several fit indices known through structural equation modeling (SEM) with latent variables.
Research limitations/implications
Researchers who use PLS-PM to assess composite models that aim to understand the mechanism of an underlying population and draw statistical inferences should take the concept of the overall model fit seriously.
Practical implications
To facilitate the overall fit assessment of composite models, this study presents a two-step procedure adopted from the literature on SEM with latent variables.
Originality/value
This paper clarifies that the necessity to assess model fit is not a question of which estimator will be used (PLS-PM, maximum likelihood, etc). but of the purpose of statistical modeling. Whereas, the model fit assessment is paramount in explanatory modeling, it is not imperative in predictive modeling.
Details