Search results
1 – 10 of over 39000Xiaobo Tang, Heshen Zhou and Shixuan Li
Predicting highly cited papers can enable an evaluation of the potential of papers and the early detection and determination of academic achievement value. However, most highly…
Abstract
Purpose
Predicting highly cited papers can enable an evaluation of the potential of papers and the early detection and determination of academic achievement value. However, most highly cited paper prediction studies consider early citation information, so predicting highly cited papers by publication is challenging. Therefore, the authors propose a method for predicting early highly cited papers based on their own features.
Design/methodology/approach
This research analyzed academic papers published in the Journal of the Association for Computing Machinery (ACM) from 2000 to 2013. Five types of features were extracted: paper features, journal features, author features, reference features and semantic features. Subsequently, the authors applied a deep neural network (DNN), support vector machine (SVM), decision tree (DT) and logistic regression (LGR), and they predicted highly cited papers 1–3 years after publication.
Findings
Experimental results showed that early highly cited academic papers are predictable when they are first published. The authors’ prediction models showed considerable performance. This study further confirmed that the features of references and authors play an important role in predicting early highly cited papers. In addition, the proportion of high-quality journal references has a more significant impact on prediction.
Originality/value
Based on the available information at the time of publication, this study proposed an effective early highly cited paper prediction model. This study facilitates the early discovery and realization of the value of scientific and technological achievements.
Details
Keywords
Amal Ben Soussia, Chahrazed Labba, Azim Roussanaly and Anne Boyer
The goal is to assess performance prediction systems (PPS) that are used to assist at-risk learners.
Abstract
Purpose
The goal is to assess performance prediction systems (PPS) that are used to assist at-risk learners.
Design/methodology/approach
The authors propose time-dependent metrics including earliness and stability. The authors investigate the relationships between the various temporal metrics and the precision metrics in order to identify the key earliness points in the prediction process. Authors propose an algorithm for computing earliness. Furthermore, the authors propose using an earliness-stability score (ESS) to investigate the relationship between the earliness of a classifier and its stability. The ESS is used to examine the trade-off between only time-dependent metrics. The aim is to compare its use to the earliness-accuracy score (EAS).
Findings
Stability and accuracy are proportional when the system's accuracy increases or decreases over time. However, when the accuracy stagnates or varies slightly, the system's stability is decreasing rather than stagnating. As a result, the use of ESS and EAS is complementary and allows for a better definition of the point of earliness in time by studying the relation-ship between earliness and accuracy on the one hand and earliness and stability on the other.
Originality/value
When evaluating the performance of PPS, the temporal dimension is an important factor that is overlooked by traditional measures current metrics are not well suited to assessing PPS’s ability to predict correctly at the earliest, as well as monitoring predictions stability and evolution over time. Thus, in this work, the authors propose time-dependent metrics, including earliness, stability and the trade-offs, with objective to assess PPS over time.
Details
Keywords
Clarence N.W. Tan and Herlina Dihardjo
Outlines previous research on company failure prediction and discusses some of the methodological issues involved. Extends an earlier study (Tan 1997) using artificial neural…
Abstract
Outlines previous research on company failure prediction and discusses some of the methodological issues involved. Extends an earlier study (Tan 1997) using artificial neural networks (ANN) to predict financial distress in Australian credit unions by extending the forecast period of the models, presents the results and compares them with probit model results. Finds the ANN models generally at least as good as the probit, although both types improved their accuracy rates (for Type I and Type II errors) when early warning signals were included. Believes ANN “is a promising technique” although more research is required, and suggests some avenues for this.
Details
Keywords
Startup duration for machine intensive plants using technologically new processes has often been long and unpredictable. Startup is defined here as the time from production of the…
Abstract
Startup duration for machine intensive plants using technologically new processes has often been long and unpredictable. Startup is defined here as the time from production of the first good unit until the plant is producing regularly at full capacity. Six year startup duration is not unusual for a plant which may cost from one million to one hundred million dollars. This is a costly problem. Startup of continuous steel casting plants was the particular machine intensive, technologically new process explored in search of a solution. This exploration sought better measures of startup and a fuller understanding of how these measures evolve in order to define and test a better method of predicting startup duration. Two‐thirds of the duration predictions resulting from the new method were accurate within less than six months of startups whose median length was 30 months. In many cases, these predictions would have been much more accurate than the expectations of the managers involved. The development of this useful prediction method and its application proceeded from both production data and theoretical bases.
Jui-Long Hung, Kerry Rice, Jennifer Kepka and Juan Yang
For studies in educational data mining or learning Analytics, the prediction of student’s performance or early warning is one of the most popular research topics. However…
Abstract
Purpose
For studies in educational data mining or learning Analytics, the prediction of student’s performance or early warning is one of the most popular research topics. However, research gaps indicate a paucity of research using machine learning and deep learning (DL) models in predictive analytics that include both behaviors and text analysis.
Design/methodology/approach
This study combined behavioral data and discussion board content to construct early warning models with machine learning and DL algorithms. In total, 680 course sections, 12,869 students and 14,951,368 logs were collected from a K-12 virtual school in the USA. Three rounds of experiments were conducted to demonstrate the effectiveness of the proposed approach.
Findings
The DL model performed better than machine learning models and was able to capture 51% of at-risk students in the eighth week with 86.8% overall accuracy. The combination of behavioral and textual data further improved the model’s performance in both recall and accuracy rates. The total word count is a more general indicator than the textual content feature. Successful students showed more words in analytic, and at-risk students showed more words in authentic when text was imported into a linguistic function word analysis tool. The balanced threshold was 0.315, which can capture up to 59% of at-risk students.
Originality/value
The results of this exploratory study indicate that the use of student behaviors and text in a DL approach may improve the predictive power of identifying at-risk learners early enough in the learning process to allow for interventions that can change the course of their trajectory.
Details
Keywords
Thao-Trang Huynh-Cam, Long-Sheng Chen and Tzu-Chuen Lu
This study aimed to use enrollment information including demographic, family background and financial status, which can be gathered before the first semester starts, to construct…
Abstract
Purpose
This study aimed to use enrollment information including demographic, family background and financial status, which can be gathered before the first semester starts, to construct early prediction models (EPMs) and extract crucial factors associated with first-year student dropout probability.
Design/methodology/approach
The real-world samples comprised the enrolled records of 2,412 first-year students of a private university (UNI) in Taiwan. This work utilized decision trees (DT), multilayer perceptron (MLP) and logistic regression (LR) algorithms for constructing EPMs; under-sampling, random oversampling and synthetic minority over sampling technique (SMOTE) methods for solving data imbalance problems; accuracy, precision, recall, F1-score, receiver operator characteristic (ROC) curve and area under ROC curve (AUC) for evaluating constructed EPMs.
Findings
DT outperformed MLP and LR with accuracy (97.59%), precision (98%), recall (97%), F1_score (97%), and ROC-AUC (98%). The top-ranking factors comprised “student loan,” “dad occupations,” “mom educational level,” “department,” “mom occupations,” “admission type,” “school fee waiver” and “main sources of living.”
Practical implications
This work only used enrollment information to identify dropout students and crucial factors associated with dropout probability as soon as students enter universities. The extracted rules could be utilized to enhance student retention.
Originality/value
Although first-year student dropouts have gained non-stop attention from researchers in educational practices and theories worldwide, diverse previous studies utilized while-and/or post-semester factors, and/or questionnaires for predicting. These methods failed to offer universities early warning systems (EWS) and/or assist them in providing in-time assistance to dropouts, who face economic difficulties. This work provided universities with an EWS and extracted rules for early dropout prevention and intervention.
Details
Keywords
Xiaoyu Yang, Zhigeng Fang, Xiaochuan Li, Yingjie Yang and David Mba
Online health monitoring of large complex equipment has become a trend in the field of equipment diagnostics and prognostics due to the rapid development of sensing and computing…
Abstract
Purpose
Online health monitoring of large complex equipment has become a trend in the field of equipment diagnostics and prognostics due to the rapid development of sensing and computing technologies. The purpose of this paper is to construct a more accurate and stable grey model based on similar information fusion to predict the real-time remaining useful life (RUL) of aircraft engines.
Design/methodology/approach
First, a referential database is created by applying multiple linear regressions on historical samples. Then similarity matching is conducted between the monitored engine and historical samples. After that, an information fusion grey model is applied to predict the future degradation trajectory of the monitored engine considering the latest trend of monitored sensory data and long-term trends of several similar referential samples, and the real-time RUL is obtained correspondingly.
Findings
The results of comparative analysis reveal that the proposed model, which is called similarity-based information fusion grey model (SIFGM), could provide better RUL prediction from the early degradation stage. Furthermore, SIFGM is still able to predict system failures relatively accurately when only partial information of the referential samples is available, making the method a viable choice when the historical whole life cycle data are scarce.
Research limitations/implications
The prediction of SIFGM method is based on a single monotonically changing health indicator (HI) synthesized from monitoring sensory signals, which is assumed to be highly relevant to the degradation processes of the engine.
Practical implications
The SIFGM can be used to predict the degradation trajectories and RULs of those online condition monitoring systems with similar irreversible degradation behaviors before failure occurs, such as aircraft engines and centrifugal pumps.
Originality/value
This paper introduces the similarity information into traditional GM(1,1) model to make it more suitable for long-term RUL prediction and also provide a solution of similarity-based RUL prediction with limited historical whole life cycle data.
Details
Keywords
Rahila Umer, Teo Susnjak, Anuradha Mathrani and Suriadi Suriadi
The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses…
Abstract
Purpose
The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses (MOOCs). It investigates the impact of various machine learning techniques in combination with process mining features to measure effectiveness of these techniques.
Design/methodology/approach
Student’s data (e.g. assessment grades, demographic information) and weekly interaction data based on event logs (e.g. video lecture interaction, solution submission time, time spent weekly) have guided this design. This study evaluates four machine learning classification techniques used in the literature (logistic regression (LR), Naïve Bayes (NB), random forest (RF) and K-nearest neighbor) to monitor weekly progression of students’ performance and to predict their overall performance outcome. Two data sets – one, with traditional features and second, with features obtained from process conformance testing – have been used.
Findings
The results show that techniques used in the study are able to make predictions on the performance of students. Overall accuracy (F1-score, area under curve) of machine learning techniques can be improved by integrating process mining features with standard features. Specifically, the use of LR and NB classifiers outperforms other techniques in a statistical significant way.
Practical implications
Although MOOCs provide a platform for learning in highly scalable and flexible manner, they are prone to early dropout and low completion rate. This study outlines a data-driven approach to improve students’ learning experience and decrease the dropout rate.
Social implications
Early predictions based on individual’s participation can help educators provide support to students who are struggling in the course.
Originality/value
This study outlines the innovative use of process mining techniques in education data mining to help educators gather data-driven insight on student performances in the enrolled courses.
Details
Keywords
Mingyan Zhang, Xu Du, Kerry Rice, Jui-Long Hung and Hao Li
This study aims to propose a learning pattern analysis method which can improve a predictive model’s performance, as well as discover hidden insights into micro-level learning…
Abstract
Purpose
This study aims to propose a learning pattern analysis method which can improve a predictive model’s performance, as well as discover hidden insights into micro-level learning pattern. Analyzing student’s learning patterns can help instructors understand how their course design or activities shape learning behaviors; depict students’ beliefs about learning and their motivation; and predict learning performance by analyzing individual students’ learning patterns. Although time-series analysis is one of the most feasible predictive methods for learning pattern analysis, literature-indicated current approaches cannot provide holistic insights about learning patterns for personalized intervention. This study identified at-risk students by micro-level learning pattern analysis and detected pattern types, especially at-risk patterns that existed in the case study. The connections among students’ learning patterns, corresponding self-regulated learning (SRL) strategies and learning performance were finally revealed.
Design/methodology/approach
The method used long short-term memory (LSTM)-encoder to process micro-level behavioral patterns for feature extraction and compression, thus the students’ behavior pattern information were saved into encoded series. The encoded time-series data were then used for pattern analysis and performance prediction. Time series clustering were performed to interpret the unique strength of proposed method.
Findings
Successful students showed consistent participation levels and balanced behavioral frequency distributions. The successful students also adjusted learning behaviors to meet with course requirements accordingly. The three at-risk patten types showed the low-engagement (R1) the low-interaction (R2) and the non-persistent characteristics (R3). Successful students showed more complete SRL strategies than failed students. Political Science had higher at-risk chances in all three at-risk types. Computer Science, Earth Science and Economics showed higher chances of having R3 students.
Research limitations/implications
The study identified multiple learning patterns which can lead to the at-risk situation. However, more studies are needed to validate whether the same at-risk types can be found in other educational settings. In addition, this case study found the distributions of at-risk types were vary in different subjects. The relationship between subjects and at-risk types is worth further investigation.
Originality/value
This study found the proposed method can effectively extract micro-level behavioral information to generate better prediction outcomes and depict student’s SRL learning strategies in online learning. The authors confirm that the research in their work is original, and that all the data given in the paper are real and authentic. The study has not been submitted to peer review and not has been accepted for publishing in another journal.
Details