To read this content please select one of the options below:

Predictive analytics for blood glucose concentration: an empirical study using the tree-based ensemble approach

Jiaming Liu (College of Economics and Management, Beijing University of Chemical Technology, Beijing, China)
Liuan Wang (School of Economics and Management, Beihang University, Beijing, China)
Linan Zhang (Department of Hematology, Chui Yang Liu Hospital Affiliated to Tsinghua University, Beijing, China)
Zeming Zhang (School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China)
Sicheng Zhang (College of Economics and Management, Beijing University of Chemical Technology, Beijing, China)

Library Hi Tech

ISSN: 0737-8831

Article publication date: 7 July 2020

Issue publication date: 4 November 2020

279

Abstract

Purpose

The primary objective of this study was to recognize critical indicators in predicting blood glucose (BG) through data-driven methods and to compare the prediction performance of four tree-based ensemble models, i.e. bagging with tree regressors (bagging-decision tree [Bagging-DT]), AdaBoost with tree regressors (Adaboost-DT), random forest (RF) and gradient boosting decision tree (GBDT).

Design/methodology/approach

This study proposed a majority voting feature selection method by combining lasso regression with the Akaike information criterion (AIC) (LR-AIC), lasso regression with the Bayesian information criterion (BIC) (LR-BIC) and RF to select indicators with excellent predictive performance from initial 38 indicators in 5,642 samples. The selected features were deployed to build the tree-based ensemble models. The 10-fold cross-validation (CV) method was used to evaluate the performance of each ensemble model.

Findings

The results of feature selection indicated that age, corpuscular hemoglobin concentration (CHC), red blood cell volume distribution width (RBCVDW), red blood cell volume and leucocyte count are five most important clinical/physical indicators in BG prediction. Furthermore, this study also found that the GBDT ensemble model combined with the proposed majority voting feature selection method is better than other three models with respect to prediction performance and stability.

Practical implications

This study proposed a novel BG prediction framework for better predictive analytics in health care.

Social implications

This study incorporated medical background and machine learning technology to reduce diabetes morbidity and formulate precise medical schemes.

Originality/value

The majority voting feature selection method combined with the GBDT ensemble model provides an effective decision-making tool for predicting BG and detecting diabetes risk in advance.

Keywords

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 71901014), the Postdoctoral Science Foundation of China (No. 2019M660427) and the Funds for First-class Discipline Construction (XK1802-5).

Citation

Liu, J., Wang, L., Zhang, L., Zhang, Z. and Zhang, S. (2020), "Predictive analytics for blood glucose concentration: an empirical study using the tree-based ensemble approach", Library Hi Tech, Vol. 38 No. 4, pp. 835-858. https://doi.org/10.1108/LHT-08-2019-0171

Publisher

:

Emerald Publishing Limited

Copyright © 2020, Emerald Publishing Limited

Related articles