Search results
1 – 10 of 50S B Kotsiantis and P E Pintelas
Machine Learning algorithms fed with data sets which include information such as attendance data, test scores and other student information can provide tutors with powerful tools…
Abstract
Machine Learning algorithms fed with data sets which include information such as attendance data, test scores and other student information can provide tutors with powerful tools for decision‐making. Until now, much of the research has been limited to the relation between single variables and student performance. Combining multiple variables as possible predictors of dropout has generally been overlooked. The aim of this work is to present a high level architecture and a case study for a prototype machine learning tool which can automatically recognize dropout‐prone students in university level distance learning classes. Tracking student progress is a time‐consuming job which can be handled automatically by such a tool. While the tutors will still have an essential role in monitoring and evaluating student progress, the tool can compile the data required for reasonable and efficient monitoring. What is more, the application of the tool is not restricted to predicting drop‐out prone students: it can be also used for the prediction of students’ marks, for the prediction of how many students will submit a written assignment, etc. It can also help tutors explore data and build models for prediction, forecasting and classification. Finally, the underlying architecture is independent of the data set and as such it can be used to develop other similar tools
Details
Keywords
Elham Mahamedi, Martin Wonders, Nima Gerami Seresht, Wai Lok Woo and Mohamad Kassem
The purpose of this paper is to propose a novel data-driven approach for predicting energy performance of buildings that can address the scarcity of quality data, and consider the…
Abstract
Purpose
The purpose of this paper is to propose a novel data-driven approach for predicting energy performance of buildings that can address the scarcity of quality data, and consider the dynamic nature of building systems.
Design/methodology/approach
This paper proposes a reinforcing machine learning (ML) approach based on transfer learning (TL) to address these challenges. The proposed approach dynamically incorporates the data captured by the building management systems into the model to improve its accuracy.
Findings
It was shown that the proposed approach could improve the accuracy of the energy performance prediction compared to the conventional TL (non-reinforcing) approach by 19 percentage points in mean absolute percentage error.
Research limitations/implications
The case study results confirm the practicality of the proposed approach and show that it outperforms the standard ML approach (with no transferred knowledge) when little data is available.
Originality/value
This approach contributes to the body of knowledge by addressing the limited data availability in the building sector using TL; and accounting for the dynamics of buildings’ energy performance by the reinforcing architecture. The proposed approach is implemented in a case study project based in London, UK.
Details
Keywords
Ye Bai, Xinlong Li and Hongye Sun
In online purchase for dietary supplements, due to the lack of professional advice from pharmacists, electronic word-of-mouth (eWOM) has become an important source of information…
Abstract
Purpose
In online purchase for dietary supplements, due to the lack of professional advice from pharmacists, electronic word-of-mouth (eWOM) has become an important source of information for consumers to make purchase decisions. How can firms use eWOM resources to increase sales? The purpose of this paper is to provide practical methods for firms by exploring the effects of eWOM on sales and developing a sales prediction model based on eWOM.
Design/methodology/approach
The data came from 120 dietary supplements on Tmall.com. The authors extracted the product sales as dependent variable and 11 eWOM factors as independent variables. The multicollinearity was tested by using variance inflation factor and least absolute shrinkage and selection operator. The multiple linear regression was used to investigate the effects of eWOM on sales. Drawing on white- and black-box approaches, six models were developed. Comparing the root mean square error, the authors selected the optimal one as their target sales prediction model.
Findings
Product ratings, total reviews and favorites are positively and strongly associated with sales. Questions and additional reviews have negative effects on sales. The random forest model has the best prediction performance.
Originality/value
The research focuses on eWOM of dietary supplement. First, the authors show that easily accessible eWOM from online platforms can be used to evaluate effects and predict sales. Second, the authors introduce white- and black-box models through machine learning to assess eWOM. Firms could use the described models to foster their marketing initiatives.
Details
Keywords
Joanna Jedrzejowicz and Jakub Neumann
This paper seeks to describe XML technologies and to show how they can be applied for developing web‐based courses and supporting authors who do not have much experience with the…
Abstract
Purpose
This paper seeks to describe XML technologies and to show how they can be applied for developing web‐based courses and supporting authors who do not have much experience with the preparation of web‐based courses.
Design/methodology/approach
When developing online courses the academic staff has to address the following problem – how to keep pace with the ever‐changing technology. Using XML technologies helps to develop a learning environment which can be useful for academics when designing web‐based courses, preparing the materials and then reusing them.
Findings
The paper discusses the benefits of using XML for developing computer‐based courses. The task of introducing new versions of existing courses can be reduced to editing appropriate XML files without any need for program change and an author can perform this task easily from a computer connected to the internet. What is more – using XML makes it possible to reuse data in different teaching situations.
Research limitations/implications
The environment has only been used for two years and further research is needed on how user‐friendly the system really is and how it can still be improved.
Practical implications
The paper describes the environment which can be used to develop and reuse online materials, courses, metadata etc.
Originality/value
The paper offers practical help to academics interested in web‐based teaching.
Details
Keywords
Omran Alomran, Robin Qiu and Hui Yang
Breast cancer is a global public health dilemma and the most prevalent cancer in the world. Effective treatment plans improve patient survival rates and well-being. The five-year…
Abstract
Purpose
Breast cancer is a global public health dilemma and the most prevalent cancer in the world. Effective treatment plans improve patient survival rates and well-being. The five-year survival rate is often used to develop treatment selection and survival prediction models. However, unlike other types of cancer, breast cancer patients can have long survival rates. Therefore, the authors propose a novel two-level framework to provide clinical decision support for treatment selection contingent on survival prediction.
Design/methodology/approach
The first level classifies patients into different survival periods using machine learning algorithms. The second level has two models with different survival rates (five-year and ten-year). Thus, based on the classification results of the first level, the authors employed Bayesian networks (BNs) to infer the effect of treatment on survival in the second level.
Findings
The authors validated the proposed approach with electronic health record data from the TriNetX Research Network. For the first level, the authors obtained 85% accuracy in survival classification. For the second level, the authors found that the topology of BNs using Causal Minimum Message Length had the highest accuracy and area under the ROC curve for both models. Notably, treatment selection substantially impacted survival rates, implying the two-level approach better aided clinical decision support on treatment selection.
Originality/value
The authors have developed a reference tool for medical practitioners that supports treatment decisions and patient education to identify patient treatment preferences and to enhance patient healthcare.
Details
Keywords
Olugbenga Wilson Adejo and Thomas Connolly
The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting…
Abstract
Purpose
The purpose of this paper is to empirically investigate and compare the use of multiple data sources, different classifiers and ensembles of classifiers technique in predicting student academic performance. The study will compare the performance and efficiency of ensemble techniques that make use of different combination of data sources with that of base classifiers with single data source.
Design/methodology/approach
Using a quantitative research methodology, data samples of 141 learners enrolled in the University of the West of Scotland were extracted from the institution’s databases and also collected through survey questionnaire. The research focused on three data sources: student record system, learning management system and survey, and also used three state-of-art data mining classifiers, namely, decision tree, artificial neural network and support vector machine for the modeling. In addition, the ensembles of these base classifiers were used in the student performance prediction and the performances of the seven different models developed were compared using six different evaluation metrics.
Findings
The results show that the approach of using multiple data sources along with heterogeneous ensemble techniques is very efficient and accurate in prediction of student performance as well as help in proper identification of student at risk of attrition.
Practical implications
The approach proposed in this study will help the educational administrators and policy makers working within educational sector in the development of new policies and curriculum on higher education that are relevant to student retention. In addition, the general implications of this research to practice is its ability to accurately help in early identification of students at risk of dropping out of HE from the combination of data sources so that necessary support and intervention can be provided.
Originality/value
The research empirically investigated and compared the performance accuracy and efficiency of single classifiers and ensemble of classifiers that make use of single and multiple data sources. The study has developed a novel hybrid model that can be used for predicting student performance that is high in accuracy and efficient in performance. Generally, this research study advances the understanding of the application of ensemble techniques to predicting student performance using learner data and has successfully addressed these fundamental questions: What combination of variables will accurately predict student academic performance? What is the potential of the use of stacking ensemble techniques in accurately predicting student academic performance?
Details
Keywords
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…
Abstract
Purpose
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.
Design/methodology/approach
Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.
Findings
Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.
Originality/value
This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.
Details
Keywords
Xiang Zheng, Mingjie Li, Ze Wan and Yan Zhang
This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively…
Abstract
Purpose
This study aims to extract knowledge of ancient Chinese scientific and technological documents bibliographic summaries (STDBS) and provide the knowledge graph (KG) comprehensively and systematically. By presenting the relationship among content, discipline, and author, this study focuses on providing services for knowledge discovery of ancient Chinese scientific and technological documents.
Design/methodology/approach
This study compiles ancient Chinese STDBS and designs a knowledge mining and graph visualization framework. The authors define the summaries' entities, attributes, and relationships for knowledge representation, use deep learning techniques such as BERT-BiLSTM-CRF models and rules for knowledge extraction, unify the representation of entities for knowledge fusion, and use Neo4j and other visualization techniques for KG construction and application. This study presents the generation, distribution, and evolution of ancient Chinese agricultural scientific and technological knowledge in visualization graphs.
Findings
The knowledge mining and graph visualization framework is feasible and effective. The BERT-BiLSTM-CRF model has domain adaptability and accuracy. The knowledge generation of ancient Chinese agricultural scientific and technological documents has distinctive time features. The knowledge distribution is uneven and concentrated, mainly concentrated on C1-Planting and cultivation, C2-Silkworm, and C3-Mulberry and water conservancy. The knowledge evolution is apparent, and differentiation and integration coexist.
Originality/value
This study is the first to visually present the knowledge connotation and association of ancient Chinese STDBS. It solves the problems of the lack of in-depth knowledge mining and connotation visualization of ancient Chinese STDBS.
Details
Keywords
Li Li, Hsin-Hung Wu, Chih-Hsuan Huang, Yuanyang Zou and Xiao Ya Li
Understanding the antecedents of patient safety culture among medical staff is essential if hospital managers are to promote explicit patient safety policies and strategies. The…
Abstract
Purpose
Understanding the antecedents of patient safety culture among medical staff is essential if hospital managers are to promote explicit patient safety policies and strategies. The factors that influence patient safety culture have received little attention. The authors aim to investigate the antecedents of patient safety culture (safety climate) in relation to medical staff to develop a comprehensive approach to improve patient safety and the quality of medical care in China.
Design/methodology/approach
The Chinese version of the Safety Attitudes Questionnaire (CSAQ) was used to examine the attitudes toward patient safety among physicians and nurses. This medical staff was asked to submit the intra-organizational online survey via email. A total of 1780 questionnaires were issued. The final useable questionnaires were 256, yielding a response rate of 14.38%. One-way analysis of variance (ANOVA) was employed to test if different sex, supervisor/manager, age, working experience, and education result in different perceptions. Confirmatory factor analysis (CFA) was used to verify the structure of the data. Then linear regression with forward selection was performed to obtain the essential dimension(s) that affect the safety culture (safety climate).
Findings
The CFA results showed that 26 CSAQ items measured 6 safety-related dimensions. The linear regression results indicated that working conditions, teamwork climate, and job satisfaction had significant positive effects on safety culture (safety climate).
Practical implications
Hospital managers should put increased effort into essential elements of patient-oriented safety culture, such as working conditions, teamwork climate, and job satisfaction to develop appropriate avenues to improve the quality of delivered medical services as well as the safety of patients.
Originality/value
This study focused on the contribution that the antecedents of patient safety culture (safety climate) make with reference to the perspective of medical staff in a tertiary hospital in China.
Details
Keywords
The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in…
Abstract
Purpose
The purpose of this paper is to discuss and assess the structural characteristics (conceptual utility) of the most popular classification and predictive techniques employed in customer relationship management and customer scoring and to evaluate their classification and predictive precision.
Design/methodology/approach
A sample of customers' credit rating and socio‐demographic profiles are employed to evaluate the analytic and classification properties of discriminant analysis, binary logistic regression, artificial neural networks, C5 algorithm, and regression trees employing Chi‐squared Automatic Interaction Detector (CHAID).
Findings
With regards to interpretability and the conceptual utility of the parameters generated by the five techniques, logistic regression provides easily interpretable parameters through its logit. The logits can be interpreted in the same way as regression slopes. In addition, the logits can be converted to odds providing a common sense evaluation of the relative importance of each independent variable. Finally, the technique provides robust statistical tests to evaluate the model parameters. Finally, both CHAID and the C5 algorithm provide visual tools (regression tree) and semantic rules (rule set for classification) to facilitate the interpretation of the model parameters. These can be highly desirable properties when the researcher attempts to explain the conceptual and operational foundations of the model.
Originality/value
Most treatments of complex classification procedures have been undertaken idiosyncratically, that is, evaluating only one technique. This paper evaluates and compares the conceptual utility and predictive precision of five different classification techniques on a moderate sample size and provides clear guidelines in technique selection when undertaking customer scoring and classification.
Details