Identifying students at risk in academics Analysis of Korean language academic attrition at the Defense Language Institute Foreign Language Center

Purpose – This paper aims to use a data-driven approach to identify the factors and metrics that provide the best indicators of academic attrition in the Korean language program at the Defense Language Institute Foreign Language Center. Design methodology approach – This research develops logistic regression models to aid in the identification of at-risk students in the Defense Language Institute’s Korean language school. Findings – The results from this research demonstrates that this methodology can detect significant factors and metrics that identify students at-risk. Additionally, this research shows that school policy changes can be detected using logistic regressionmodels and stepwise regression. Originality value – This research represents a real-world application of logistic regression modeling methods applied to the problem of identifying at-risk students for the purpose of academic intervention or other negative outcomes. By using logistic regression, the authors are able to gain a greater understanding of the problem and identify statistically significant predictors of student attrition that they believe can be converted into meaningful policy change.


Introduction
Teachers, instructors and leaders must identify those students or trainees most at risk for attrition from Department of Defense (DoD) educational and training programs.This permits targeted interventions to ensure that personnel production pipelines provide trained and ready service members to meet requirements in support of the national defense.Identifying those in need of support early in a program allows leadership to more effectively allocate limited intervention resources.The indicators of potential attrition vary, as does the experience level of leaders and instructors tasked to identify those in need.To better target interventions and mitigate varying experience levels of staff, training organizations require a data-driven approach to aide in identification of those most likely to undergo attrition.This research demonstrates a case study application of the use of statistical modeling to develop a decision aid for use within the Korean language program at the Defense Language Institute Foreign Language Center (DLIFLC), located on the Presidio of Monterey in Monterey, California.

Background
The Korean Department at DFLIC trains over 300 students annually from across the services (i.e.Army, Air Force, Navy and Marines).The DoD categorizes Korean as in the highest difficulty level, Category (CAT) IV and also requires a large number of trained Korean linguists each year.Students enrolled in the program either complete their assigned course, recycle into a later class group, are reclassified into an easier language program or are dropped from the linguist program for either academic or administrative reasons.
Students whose cumulative course average (CCA) falls below 2.0 are placed on academic probation, which consists of ten study sessions tailored to the student's deficiencies.Students who remain on academic probation for two probation periods are academically recycled.Students who fall below a 3.0 CCA are considered "at risk" and placed on the Special Assistance Program where they receive extra instruction.The most important academic requirement for students is to pass the Defense Language Proficiency Test (DLPT) with a minimum score of 2 in Listening, 2 in Reading and 1þ in Speaking (2/2/1þ).The highest score that a student can attain is a 3/3/3 (Headquarters HQ, DLIFLC, 2013a).These standards are challenging and sometimes result in a student becoming an academic failure or failing to meet the DLPT standards.When this happens, the residual costs can be high, which makes the identification of students in danger of failing and early intervention a priority for school faculty and administrators.

Related work
There has been numerous DoD-led research that has addressed student performance at DLIFLC over the years.The DLIFLC Research and Analysis Division conducted a study in 1994 followed by a 1996 NPS thesis entitled The Effects of Gender on Attrition at DLIFLC that analyzed factors significant in predicting attrition (O'Mara et al., 1994;Arthur, 1996).These studies determined that gender alone was not associated with attrition, but identified interactions between gender, level of education and age that were significant.Two separate attrition studies found that students who received entry waivers for the Defense Language Aptitude Battery (DLAB), which is the DoD's standard test that shows language potential and is similar in purpose to the Standard Aptitude Test (SAT), undergo attrition at higher rates (Lee, 1990;Wong, 2004).A 1999 study concluded that semester cumulative GPAs were the best predictors of proficiency on the DLPT and that later semester GPAs were increasingly better (DeRamus, 1999).
In 2012, DLIFLC assessed the impact of current policies on student attrition (Sayler et al., 2012).This study identified the following as significant factors to attrition: prior language experience; motivation; level of education; health and physical fitness;

Students at risk in academics
class size; and study habits.
Shearer examined the loss of language proficiency by military linguists after graduating from DLIFLC and found that semester GPAs were heavily correlated to performance outside of school and were a good indicator of DLPT performance (Shearer, 2013).
Outside of DoD, there is a prolific body of work that addresses student performance at American universities.Logistic regression has frequently been used to predict student success or failure.
Common themes arise in these studies.Predicting freshmen at risk of academic attrition is difficult because there are few pre-college factors available.The best commonly available predictor variables are SAT scores and high school ranking (Scalise et al., 2000;Sperry, 2015).Scalise also showed that gender and student perceptions of their given field of study (i.e.engineering) were not significant predictors of student success or failure (Scalise et al., 2000).Sperry's research spanned multiple academic learning communities (i.e.history, political science, science, developmental history, etc.) and found that the predictors varied significantly between communities.
Modeling student success and predicting academic attrition becomes significantly more accurate as student performance scores are factored in during the course of the academic year (Marbouti et al., 2016).Marbouti et al. demonstrated how analyzing standards-based student assignments over time greatly improved the ability to predict success or failure.Furthermore, this research showed that Naïve Bayes Classifier models and an Ensemble model using a sequence of models (i.e.support vector machine, K-nearest neighbors and Naïve Bayes classifier) outperformed logistic regression models when predicting students at risk of academic failure.

Data
The data set for this study consists of anonymous performance and demographic data describing Korean Basic Course language students who attended from 2006 through 2013.The population includes initial entry students, recycled students and post-DLPT (PDLPT) students.It is important to note that recycled students have multiple entries, but for this data set, DLIFLC removed earlier entries for recycled students and provided only information on their final enrollment.The population consists of more than 2,000 students and encompasses all ranks and services.To properly interpret and model students "at risk" of not graduating for academic reasons, students whose attrition was administrative (e.g.medical) were removed from the data set; these accounted for less than 20 per cent of the data.
This data set initially contained 47 independent variables for each student, 38 of which were retained and 9 discarded because they were irrelevant to the study (e.g.duplicate factors in a different format, various student ID schemas, etc.).We identified our response variable as production rate failure, which takes the value 1 when a student fails his or her academic courses or fails to meet DLPT standards.
The yearly production rate varies; however, the aggregate production rate from FY 2006 through FY 2013 was 69 per cent (Figure 1).
The initial variable pool consisted of 17 categorical variables, many of which required a reduction in the number of levels.Variables with only two or three levels required no change, but there were 13 variables with four or more levels (in one case, there were as many as 44), which we reduced to five or fewer levels.

Modeling approach
We used logistic regression to model the relationship between the binary response variable (Production rate failure = 1, Graduate and pas DLPT = 0) and numerous independent variables.To select the best variables and combinations, we used a five-step process.
Step 1: Variable screening and selection: This step requires that each possible variable be examined, using univariate analysis, to identify variables that exhibit a moderate level of association with the dependent variable.In this stage, the likelihood ratio test is used to identify those variables that are sufficiently valuable to be included in the multivariable model.
To narrow the variable pool before we begin building multivariable models, we conduct a univariate logistic regression fit of each of the 34 independent variables to our dependent response variable (Production rate failure = 1, Graduate = 0).Any variable whose univariate test has a p-value < 0.25 becomes a candidate for the multivariable model, resulting in the elimination of five variables (Hosmer et al., 2013): Years of service, Marital status, Prior language, Prior source and Prior experience.
Step 2: Model building: Once the candidate variables are identified, we create a model with all candidate variables and use the stepwise method to identify the correct variables to include in the main effects model.The step-wise procedure that we use for this model is forward selection with a test for backward elimination (Hosmer et al., 2013).The statistical criterion we used was the Akaike Information Criterion (AIC) defined as: where p is the number of parameters.Each step in the step-wise procedure initiates a search strategy through the space of possible models and sequentially eliminates terms to minimize AIC (Faraway, 2006).Following the step-wise procedure, we further refine the remaining variables to only include those that satisfy traditional levels of significance (p-value < 0.05).The resulting model becomes our main effects model.
Step 3: Checking for interactions: Interactions between variables must be considered when creating a model (Hosmer et al., 2013).Each interaction considered is added to the main effects model and the step-wise procedure identifies interactions that will be included in the preliminary final model.The resulting model becomes the preliminary final model.

Students at risk in academics
Step 4: Validate model: In the final step, we assess the model's adequacy for predictive use.The receiver operating characteristic (ROC) curve is a product of signal detection theory and plots the probability of detecting a true signal and a false signal for a range of possible cut points.The area under the ROC curve (AUC) provides a measure of discrimination, ranging from 0 to 1.If the AUC is greater than 0.7, then the model is determined to be acceptable (Hosmer et al., 2013).

Modeling results
We created four initial models using FY 2006 through FY 2013 data that represent four major academic milestones for students.These models (Models 1-4) identify factors that place students at higher risk of academic attrition at four points in their program.During the analysis, we saw that the year group a student belonged to was a statistically significant factor in predicting success or failure.We suspected that it was due to curriculum and administrative changes enacted over that time frame.As a result, we created four similar models (Models 5-8) in which we limited the data set to reflect only current academic conditions at DLIFLC, which we identified as FY 2011 through FY 2013.This allowed us to create a set of models that were not reliant on graduation year and could possibly identify factors that are currently more relevant to students learning Korean.The four academic milestone models are: Models 1 (FY 06-13) and 5 (FY 11-13) identify students at-risk when they initially enter the program, but before they actually begin taking classes.These models include all students from the data and the independent variables are limited to only student demographics.Models 2 (FY 06-13) and 6 (FY 11-13) identify students at-risk after they have completed the first semester.These models only include students who successfully completed the first semester and first semester GPAs are included.Models 3 (FY 06-13) and 7 (FY 11-13) identify at-risk students after they have completed semester two.We only include student data from those who successfully completed the second semester.Additionally, second semester GPAs are included in these models.Models 4 (FY 06-13) and 8 (FY 11-13) represent students who have completed all three semesters of the Korean Program and are prepared to take the DLPT.The models identify students at risk of failing the DLPT and essentially model proficiency rate.These models make use of all independent variables and are built using only the students who successfully complete all three semesters.
Figures 2 and 3 show the respective ROC curves of each set of models.Models with ROC curves that are closer to the top left-hand corner of the plot have a high AUC and are better models than straighter curves that are closer to the center diagonal of the plot.
Limiting data to FY 2011 through FY 2013 resulted in models that were better predictors than models built with all eight years of data.The AUCs of Models 5 through 8 exceeded their counterparts in Models 1 through 4. Acceptable levels of predictive distinction were found in all models that included course GPAs and were acceptable for use as a prediction tool.Although Models 1 and 5 did not have AUCs high enough to be classified as acceptable (AUC ≥ 0.70), they were still useful models and identified factors that placed a new student at risk (Table I).
By comparing all the models, we discovered some important insights into identifying students at risk of academic attrition.First, there was little improvement in AUC between JDAL 1,1 second semester models (Model 3 and 7) and third semester models (Models 4 and 8).This indicates that an academic outcome can be predicted at the end of second semester nearly as well as at the end of third semester.Second, Model 8 was the strongest model and showed that of the 15 academic courses that students took over their 18-month curriculum, only five were highly significant predictors of success or failure.Third, there were two factors that appeared in all models; the service a student was from and whether the student had been recycled (Table II).The GPA from multiple courses was important, with a higher course GPA increasing the odds of completing the program for each course.

Discussion
This research identified factors that provide insight into student attrition risk.Demographic factors such as sex, years-in-service and marital status were generally insignificant predictors in a multivariate model and became increasingly less significant in the presence of academic performance measures, such as semester course GPAs.There were, however, some demographic and enrollment factors that were consistently significant predictors of students with higher odds of graduation failure.These factors are: Graduation year: Models 1 through 4, which used data from FY 2006 through FY 2013, all contained graduation year as a significant factor for predicting graduation failure.This showed that students who graduated in FY 2006 and FY 2007 were at much greater risk than students in follow-on years.This suggests that curriculum and policy changes within the Korean Program may have led to significant improvements in student production rates.
Pay grade: In Models 1 through 6, officers were identified as students at higher risk than their junior enlisted peers.In models that were built using the past eight years of data, non-commissioned officers (NCO) seemed to perform as well as the junior enlisted students.In models built using only the past three years of data, however, Military service: In models reflecting current conditions, Army, Navy, or Marine Corps students are at greater risk than Air Force students.This would indicate that the Air Force's policies and mentoring program may be more effective than those of the other branches.It is, however, worth noting that Models 1 through 4 did not identify military service as a significant factor, indicating that up until recently students from all the services were essentially indistinguishable in terms of performance.
DLAB score: The DLAB was only significant when modeling students who were beginning their first semester.In all other models, the DLAB became an insignificant predictor in the presence of actual academic performance measures.
Recycled student: Every model built identified recycled students as having greater odds of not graduating than initial entry students.The odds of recycled students not graduating or passing the DLPT compared to initial entry students was 1.5 to 1.This would indicate that recycled students have the capability to graduate once recycled, but they remain a higher risk despite the extra instruction and resources that they receive.
Prior language proficiency: Students who admit on DLIFLC FORM 90A that they studied a language before, but had poor proficiency; had, on average, 22 per cent greater odds of failure than a student who admitted to having a good or excellent proficiency; or did not have any prior language experience.It is worth noting that this factor was only significant in Models 1 and 5 and was not significant in the presence of course GPAs.
Semester Course GPAs were the most significant modeling factors for at-risk students.This was very apparent from the significant improvement of models with semester course GPAs (Models 2 through 4 and 6 through 8) versus models without semester course GPAs (Models 1 and 5).Additionally, models built with semester course GPAs were better models than ones built using semester cumulative GPAs.We believe this is attributed to the fact that not all course GPAs were significant in predicting student outcome, which means that semester cumulative GPAs are less meaningful because they include these non-predictive courses.Korean culture-and history-focused courses (e.g.Intro to Korean Culture, History and Geography of the Korean Region and Korean Area/Cultural Studies) were not selected as significant predictors of student outcome because the grade distribution did not separate atrisk students from the rest of the population.Additionally, conversational-focused courses, such as Elementary, Intermediate and Advanced Korean Conversation, were not found to be significant predictors in the presence of other courses.Models 5 through 8, which reflect the most current conditions in the Korean Program, were the strongest predictive models.The only model in this set that did not meet acceptable levels of discrimination was Model 5, which represented new students of Korean prior to beginning first-semester courses.Although this model was useful in identifying factors that could help predict academic success, we can conclude that there is no reliable way to predict how successful a student will be in Korean until he or she begins the program.Models 6 through 8 were excellent predictive models, with AUCs greater than 0.80.Model 8, which modeled students at-risk of failing the DLPT at the end of their third semester, had only a 12 per cent misclassification rate.These models showed that faculty could determine student Students at risk in academics outcome with reasonable accuracy after the first semester.This accuracy improved more with each subsequent semester.It is worth noting, however, that there was little improvement in AUC between Models 7 and 8 (Table III).This suggests that instructors and faculty can determine a student's outcome at the end of the second semester nearly as well as at the end of the third semester.
To aid in the model's implementation, we constructed four "smart cards" for distribution to relevant DLIFLC personnel.These cards were intended to make the results of Models 5-8 usable by those charged with deciding which students should be recycled or reclassified.These personnel are knowledgeable about the academic needs of their students, but not, generally, about logistic regression.Table IV shows an example of one of these cards, used to implement Model 6.The model implementation in these smart cards was designed to identify 70 per cent of the at-risk population and discriminates between initial entry students and recycled students.Students with course GPAs less than the thresholds depicted in the cards are determined to be at risk and candidates for additional instruction.

Conclusion
This research represents a real-world application of logistic regression modeling methods applied to the problem of identifying at-risk students for the purpose of academic intervention or other negative outcomes.DLIFLC's Korean Program afforded us an opportune case study that we believe is in many ways applicable to other academic programs at other institutions.By using logistic regression, we were able to gain a greater understanding of the problem and identify statistically significant predictors of student attrition that we believe can be converted into meaningful policy change.This research built statistical models of academic attrition from the Korean Program from FY 2006 through FY 2013.We constructed eight logistic regression models that predicted student attrition at periodic milestones: Beginning Semester 1; Beginning Semester 2; Beginning Semester 3; and Post Semester 3, but before the DLPT.
Using these models, we were able to successfully identify demographic factors and semester courses that were significant predictors of at-risk students and gain greater insight on the effectiveness of current grading rubrics.We also were able to determine that the year a student graduated was significant.The implication for DLIFLC and other institutions is that academic institutions and students change over time and it is important to revisit policies and educational assumptions frequently.
Currently, DLIFLC uses a CCA < 3.0 as the general designator for an at-risk student and this is the threshold that dictates when a student must be enrolled in the Special Assistance Program.Given that approximately two of five courses, each semester were not significant factors in predicting student outcome, we recommend that the Korean Program and other academic institutions not use CCAs exclusively to identify students at-risk.Our research suggests that some courses are more predictive of attrition than others and institutions should identify those courses when trying to decide to employ limited intervention resources.
Not surprisingly, recycled students showed a higher risk of attrition.Additionally, we saw that at the end of the first semester, predictive models showed excellent levels of discrimination with little improvement when the second and third semesters' course work was included.This means that DLIFLC generally has enough predictive information to make the decision to recycle students at the end of the first semester or remove them from the course completely without having to wait another year for more evidence that the student will likely fail the course.This realism is likely applicable to other academic institutions and administrators should realize that they may be able predict student outcome very early in a student's program.
DLIFLC's Korean Program is a critical and successful program that has trained multitudes of DoD linguists over the years.Its current policies and practices work effectively to meet current requirements.However, this research showed that tools such as logistic regression modeling provide key insights that have the potential to create meaningful policy change.
Figure 1.Cumulative distribution of all Korean student outcomes at DLI from FY 2006 through FY 2013 after removing administrative attrition from the data set Figure 2. Set 1 (Models 1-4) ROC curves show that models that apply to students farther along in their academic careers have higher AUCs and are thus stronger models NCOs seem to perform like officers.This would indicate that under current conditions in the Korean Program, NCOs and officers have greater odds of not graduating compared to junior enlisted students.