Finding the fuel of the Arab Spring fire: a historical data analysis

Purpose – This paper aims to address the reasons behind the varying levels of volatile conflict and peace as seen during the Arab Spring of 2011 to 2015. During this time, higher rates of conflict transition occurred than normally observed in previous studies for certain Middle Eastern and North African countries. Design/methodology/approach – Previous prediction models decrease in accuracy during times of volatile conflict transition. Also, proper strategies for handling the Arab Spring have been highly debated. This paper identifies which countries were affected by the Arab Spring and then applies data analysis techniques to predict a country’s tendency to suffer from high-intensity, violent conflict. A large number of open-source variables are incorporated by implementing an imputation methodology useful to conflict prediction studies in the future. The imputed variables are implemented in four model building techniques: purposeful selection of covariates, logical selection of covariates, principal component regression and representative principal component regression resulting in modeling accuracies exceeding 90 per cent. Findings – Analysis of the models produced by the four techniques supports hypotheses which propose political opportunity and quality of life factors as causations for increased instability following the Arab Spring. Originality/value – Of particular note is that the paper addresses the reasons behind the varying levels of volatile conflict and peace as seen during the Arab Spring of 2011 to 2015 through data analytics. This paper considers various open-source, readily available data for inclusion in multiple models of identified Arab Spring nations in addition to implementing a novel imputation methodology useful to conflict prediction studies in the future.


Introduction
From the self-immolation of Muhammad Al-Bouazizi to the prolonged occupation of the Islamic State of Iraq and Syria (ISIS), what is now known as the Arab Spring caused a complex shift in stability in a subsection of the Middle East and North African region. This sudden shift which saw varied responses and outcomes to the outburst of protests from long-standing regimes in the region was unexpected by the USA and the rest of the Western world (Gauss, 2011). Four of these regimes were overthrown and the vacuums of power resulting were filled with chaos in some nations and stability in others. Other countries responded to the movements in the region with reformation without toppling regimes but saw similar, varied results. The reactionary policies by the rest of the world, enacted following the initial sparks of instability in 2011, have been highly debated. This paper offers postdictive analysis on how to most effectively allocate resources and alter policy outcomes in the latent environment of future high-conflict transition regions. Boekestein (2015), Shallcross (2016) and Leiby (2017) provide a strong framework from which possible outcomes can be analyzed using political, military, economic, social, infrastructural and information systems (PMESII) data. This study furthers their research by focusing on a specific anomaly of conflict shock to a region known as the Arab Spring. The first necessary step for analyzing this anomaly is to identify which countries were affected by the Arab Spring which in turn can be thought of as a definition for the Arab Spring. This study adopts the definition proposed by Costello et al. (2015) that nations experienced an "Arab Awakening" if they saw an increase in violent or nonviolent protests in 2011, the first year of the Arab Spring. According to their research, which measured protests based on AP news reports in the region, 11 countries saw such an awakening. The total country list used in this study adopts those 11 countries, along with Saudi Arabia for analysis as displayed in Table I.
Saudi Arabia was included because of the government's harsh reaction to initial protests from the minority Shiite group. These protests diminished quickly, although not because of a lack of fervor from protesters. This signifies Saudi Arabia as an interesting case in the diverse reaction to the Arab Spring. In general, an increase in demonstrations is a useful standard for consideration, as it is recognizable for future application and describes the tools by which change occurred during the Arab Spring.
Along with this definition of the Arab Spring region, this study also sought to uncover hypothesized factors affecting conflict, especially relating to the unique case of the Arab Spring. Costello et al. (2015) test four commonly debated hypotheses regarding the causes of the Arab Spring. These four hypothesized factors are a growing, young, educated population, a democratic deficit or long-standing authoritarian rule, political opportunity and the growth of the new communications media through cell phones and the Internet. Costello et al. (2015) argue that the growing number of young, unemployed citizens in the region spurred economic incentives to protest against standing governments. This first factor is closely related to the work by Urdal (2006) on the effect of a "Youth Bulge" on conflict. Urdal found that countries with a youth bulge are more likely to experience increased terrorism when coupled with either low economic growth or high tertiary (college) education participation. Costello et al. (2015), however, did not find the Youth Bulge factor to be significant for explaining Arab Spring protests, as they did not incorporate an unemployment rate for the younger cohort. This study contends that incorporating a factor similar to the conclusion from the Urdal study may be significant in predicting Arab Spring conflict. The democratic deficit and political opportunity hypothesized factors from Costello et al. (2015) are found to be inverses of each other. If the lack of democracy incites citizens to protest, then political openness should have an inverse relationship with increased levels of conflict. Using qualitative-driven, numeric ratings for political openness however, they found that protests were driven by increased levels of political openness. This means that when citizens had increased means of affecting the government, they were more likely to protest as they likely felt that their efforts would create real change or at least that they did not fear backlash from protesting. The final factor tested by Costello et al. (2015) was the increased usage of cell phones in the region. They assert that the use of cell phones and social media is an important consideration for the escalation of protests. Wolfsfeld et al. (2013) challenge the propensity to credit the advent of social media as a main impetus for the increase in severity and violence seen in the Arab Spring. Their study hypothesizes that widespread media is a reaction and not a cause of political uprising and that the effect of social media is different depending on the political environment in which the protests occur. Khondker (2015) focuses on this idea and analyzes whether technology and social media can be attributed as an impetus for the efficacy of the Arab Spring. Interestingly, he found that technology does have some efficacy as an instrument for change, but that governments can also use these tools to further repress the masses. Because of this contention, technology use is analyzed in this study.
This study also views the role of a regional effect as suggested by Boekestein (2015) who created a variable called Border Conflict which maps the level of conflict of neighboring countries weighted by the per cent of border shared. This effect might be useful in the study of the Arab Spring because of the locality of the conflict and the rapidity of its proliferation. Along with the regional factor, this article uses the insights gleaned from Boekestein (2015) as well as its follow-on studies by Shallcross (2016) and Leiby (2017) to develop a useful methodology for conflict prediction in the Arab Spring era.

Methodology
The theses by Boekestein (2015), Shallcross (2016) and Leiby (2017) use a binary, conflictdependent variable in logistic regression analysis to test multiple PMESII factors for conflict prediction. This study uses these papers as baselines for both methodology and analysis. An overview of the methodology for this study is depicted in Figure 1, including the three main research efforts, organized by row: data preparation, model building and model validation and analysis. Novel to this methodology is the inspection of imputation methods of PMESII data, the testing of four different model building techniques and sensitivity analysis of prediction outcomes.
The data used in this study came from multiple, open-source agencies with the majority of variables originating from the UN World Bank (The World Bank The World Bank, 2017a). The initial dataset consisted of 79 independent, PMESII variables containing a majority of non-missing elements and a binary, dependent variable for the 12 Arab Spring countries over the years 2011 to 2015. This provided 60 country-year observations. Threeyear time trends were also calculated for each variable with the naming convention 3YT and the variable name following. The dependent variable is a transformation of the Heidelberg Institute for International Conflict Research (HIIK) conflict barometer which has six levels 0, 1, 2, 3, 4 and 5 from no conflict (0) to war (5) (Heidelberg Institute for International Conflict Research, 2016). This study created a conflict identifier, classifying countries as being in "high-intensity, violent conflict" if they achieved HIIK values of four or five which are classified as limited war and war, respectively, and not in "high-intensity, violent conflict" if they achieved HIIK values zero through three, classified as no conflict to violent crisis. This JDAL 2,2 delineation is a departure from the baseline conflict prediction studies that considered countries to be in conflict if they achieved HIIK values of three or greater. The hope is that the new classification should alert decision-makers to alarming scenarios when a significant threat to human life is apparent to account for the limited nature of economic and human resources available to the US military.

Imputation
The 79 variables in the original dataset contained 4.95 per cent missing values with 48 of the variables requiring imputation. To address missingness, this study attempted to find an improved method of imputation from past PMESII studies, some of which used single imputation using simple regression techniques and did not include adequacy measures of imputed values (Shallcross (2016)). Central to this study's imputation techniques is the assumption that the magnitude of each country's observations should be independent of other countries, foregoing regional time trends. Resulting from this assumption, all variables missing an entire country's observations were removed from consideration reducing the total variable pool to 54, 23 of which required imputation. The majority of the PMESII data missingness was caused by two factors, either the most recent data, in this case 2015, was missing, as the open-source data compilers had not yet produced these statistics, or values were missing from countries during years of intense political turmoil when institutions were unable to collect data because of low government stability or identity. These two categories make a unique case for the missingness of PMESII which also falls under the category of data known as missing at random (MAR) (Kang, 2013).
The classification of the PMESII dataset as being MAR allows for the application of the technique called multiple imputation by chained equations, or MICE (van Buuren and Groothuis-Oudshoorn, 2011). MICE is a method that can apply several types of relevant imputation methods, such as linear regression or classification trees, to each variable with missing data. Using the specified methods, it calculates m completed data sets through Monte Carlo Simulation to capture the variability and uncertainty of the missing data where m is set by the analyst and is classically set to ten (van Buuren, 2007). Most applications of MICE suggest keeping all m sets throughout the analysis and pooling the results at the end of the study as this has been shown to maintain the most realistic variability. In this case, however, m imputed datasets were pooled prior to analysis as the low rate of imputed data (4.51 per cent) should not produce a meaningful difference throughout the analysis, and a Although it is held by Schafer (1999) that the pre-mature pooling of imputations may cause a slight bias, using the mean of the distribution of estimates still allows for some of the variability of the missing data as well as the possibility of non-normal distributed observations. These attributes are often lost in single imputation techniques which provide inflated precision estimations. For each variable with missing values, five imputation methods were tested: predictive mean matching, classification and regression trees, Bayesian logistic regression, random forest and unconditional mean.
The adequacy of imputations was tested by accepting those imputations that most closely followed the distribution of the given data. This was conducted by comparing given data to imputations using the non-parametric Anderson-Darling and Kolmogorov-Smirnov tests according to the research by Engmann and Cousineau (2011). Both of these tests use the null hypothesis that the two samples do, in fact, come from the same parent distribution. As mentioned previously, the data requires the assumption that the magnitudes of each country's observations are independent of each other. Because of this, imputations had to be compared within countries. This proved to be difficult, as some countries were missing only one value for the given variable making a non-parametric, two-sample test infeasible, and thus reduced the usefulness of imputations, as inferences would be calculated using less than five observations within each country. Instead, a statistic was developed according to Equation1that normalized the observed data prior to imputation: 2 1; 2; . . . ; n c ¼ country 2 1; 2; . . . ; 12 (1) The normalization ensured that the preferred method created imputations from the same parent distribution as the corresponding observations within each country while using all 60 observations for imputation calculation. Key to this approach was the assumption that each variable is distributed according to the same family of distributions with possibly different distribution parameters. Imputation could thus be run for all countries within a variable simultaneously under the assumption that all countries come from the same family of distributions, as they should all trend similarly during observed years. This allowed the use of non-parametric testing for all variables and increased the statistical insights by increasing the number of observations used in imputation from 5 (# missing f or country c) to 60 (# missing f or all countries).
The efficacy of the test on the transformed variables can be understood through the comparison of two hypothetical distributions from Country A and Country B. Country A is sampled with observations for variable i that follow a skewed Weibull distribution with mean, m 1 and standard deviation, s 1 . Country B is sampled with observations for variable i that follow a skewed Weibull distribution with mean, m 2 and standard deviation, s 2 . Once normalized, both samples would follow the same distribution with values X a i ÀX a s a and X b i ÀX b s a for Countries A and B, respectively. From there, imputation would be performed on the normalized samples from all countries. The imputation produces a normalized value with realistic variation from the common family mean and standard deviation based on the observed data for that country-year. The values are then transformed to the country of interest's distribution. Solving for the X i 1 and X i 2 would produce observations scaled to the JDAL 2,2 distributions of Countries A and B separately according to m 1 and s 1 for Country A and m 2 and s 2 for Country B as shown in equation (2): Without normalizing the data within each country, the imputed data could artificially be considered to fit the given data distribution well because of the spread in magnitude of countries' values within a variable. The distribution of data without normalization is multimodal with peaks near each country's mean. This would cause imputations from the preferred strategy to fall near each of the country peaks and not the country distribution that the imputed observation corresponds to instead. The test value for the preferred imputation method for all variables passed above the 0.1 alpha level. This shows that MICE is an acceptable technique for creating imputations that closely follow the given data.

Model building
Using the completed dataset, the next step in the methodology was to test four model building techniques. The techniques tested were purposeful selection of covariates (PSOC), logical selection of covariates (LSOC), principal component regression (PCR) and representative PCR (RPCR). PSOC is a method developed by Hosmer et al. (2013) that was used by Shallcross (2016) and is used as a baseline in further testing. It was hypothesized that all other techniques would build off of the weaknesses of PSOC and are compared in the validation section by analyzing the resulting models obtained through each technique. PSOC is a seven-step, statistical-based technique that allows some user input on the variables that flow in and out of the model. It has the benefit of checking many of the assumptions of logistic regression prior to the validation step. The first step checks for univariate significance of each independent variable to the dependent variable. All of the variables with significant results are then used to form a multivariate model. This full model is reduced to balance variable significance and parsimony with model significance and confounding factors. Each variable is tested for entrance into the model at multiple steps of the process making it robust to inclusion of significant variables. It also checks for linearity of variables to the logit function, as well as for significant interactions. The most troubling issue with PSOC was the inability to differentiate between highly correlated variables. The initial, full model could not be reduced based on parameter estimate significance, as the model was unstable. Instead, a separate test was conducted to differentiate between correlated variables. For each set of correlated variables, only those with the highest correlation to the dependent variable were kept for inclusion in the reduced model. This however, could have eliminated variables with significant multivariate relationships and does not provide an interpretable reasoning for the inclusion of variables in the reduced model. The final model included three, three-year time trends for consumer price index, human development index and refugees by origin.
LSOC was developed by this study to improve upon the correlation and interpretation issues faced in PSOC. Although the LSOC process is quite similar to that of PSOC, it has two divergences. To initiate the process, only variables and time trends that are backed by a researched hypothesis are analyzed for inclusion in the model. The other alteration to the PSOC technique reduces multicollinearity in the full, multivariate model by first eliminating all base variables with insignificant parameter estimates which corresponded to a trend variable that was also included in the model. This is performed according to the assumption that the momentum of a country's PMESII factors is more evident of its future status than simple, current status. the final model included three-year time trends for consumer price Fuel of the Arab Spring fire index and human development index, as well as the voice and accountability variable, which captures the population's perception of their ability to affect their government or freedom The World Bank (2017b). LSOC decreased multicollinearity by decreasing the pool of candidate variables with a hypothesis. PCR and RPCR attempt to reduce the multicollinearity through near-orthogonal, principal components. After calculating principal components using the z-score, standardized dataset of all variables and time trends, the components were considered for inclusion in a separate model. Had and Ling (1998) remind PCR practitioners to incorporate all components in model building, including those that explain only a small amount of variation of the data as the calculation of principal components does not take into account the relationship of the data to the dependent variable of the regression. This model building technique should be impervious to analyst bias, and may uncover more accurate PMESII factors that were not represented in the dataset, while sacrificing interpretation because of the nature of principal components. The final model using PCR included the components shown in Table II along with their proposed definitions and representatives, which will be described next, based upon component loadings.
RPCR began with the final model of PCR and substituted each component with one or two of the variables with highest component loadings to represent the component in order to capture its meaning but to also increase interpretability of the final model. The model was reduced and resulted in the two representatives for principal components 5 and 21 which were represented by 3YT human development index and the interaction of voice and accountability and polity.

Model analysis
The models resulting from the four model building techniques were compared with three tests of validation and two comparisons of accuracy. The models were validated using the Pearson x 2 test to determine overall model fit, the Hosmer-Lemeshow (H-L) test to determine the goodness of fit throughout the entire probability spectrum and the area under the receiver operator characteristic (ROC) curve (AUC) to determine the ability of each model to discriminate between positive and negative events. The results for the validation and accuracy measures are summarized in Table III.
Following the results of the Pearson x 2 test, it was concluded that the PSOC, LSOC and RPCR models have general goodness of fit. The PCR model however failed this test at the 0.05 a level. In an analysis of the Pearson residuals, it was evident that the PCR model failed this test because of its poor predictive ability of the Mauritania 2011 observation. As the error was local to a single point, validation and performance statistics are included for the PCR model excluding this observation; however, this study suggests that this point is not an Low unemployment, high control of population % Armed forces * Youth Labor 5 High quality of life trend 3YT HDI 13 Economic growth, decreased international threat 3YT CPI * 3YT Border Conflict 17 Assistance to struggling countries USAID * Polity 21 Openness of government Voice and accountability * Polity 29 Vacuum of power Deployed US Troops * Uneven development JDAL 2,2 outlier and decides not to omit it from final consideration. The H-L test assesses whether or not the observed event rates match expected event rates in subgroups of the model population and supports goodness of fit of ten probability intervals in all models except for PCR which failed because of the Mauritania 2011 observation. Finally, the AUC demonstrated excellent discriminability for the PSOC, LSOC and PCR models with values in excess of 0.95, but less discriminability for the RPCR model which signaled a lack of trust in this model. The performance of each model was tested using training set accuracy and a primitive prediction of 2016 conflict. Although a test of validation set data is preferred in determining a model's performance, this study was limited to 60 country-year observations and contends that a reduction of this training set would decrease the reliability of the results from a logistic regression study analyzing many independent variables (Shallcross, 2016). This study also suggests that the models generated from the Arab Spring data should not be considered for conflict prediction in years following 2016 as the abnormal conflict volatility that defined the Arab Spring has likely subsided into normal conflict transition behavior. Accepting these assumptions, accuracies were calculated using conflict probability cutoffs at the point for which the ROC curve of each model is farthest from the 45°line of no discriminability. The training set accuracy is maximized with the PCR model; however, the LSOC model has comparable results with both achieving accuracies in excess of 90 per cent. This training set accuracy outperforms the most commensurate models of Boekestein (2015), Shallcross (2016) and Leiby (2017). The primitive prediction of 2016 conflict was conducted to ensure the applicability of the model for future forecasting efforts. This was achieved by comparing the 2015 conflict probabilities calculated using each model formula to the conflict outcome data for 2016. This was the most realistic comparison since the 2016 Balancing model validation, accuracy and interpretability, the LSOC model was chosen as the dominant model as it passed all validation tests, unlike PCR, achieved accuracies in excess of 90 per cent, unlike PSOC and RPCR, and is interpretable because of its construction based off of a researched hypothesis which may lead to causal explanations for Arab Spring conflict. The odds ratios for this model indicate that the odds of conflict in a given country-year observation are increased with an increasing trend in the consumer price index, a decreasing trend in the human development index, both supplemented by a decreased voice and accountability rating. On a generalized level, this appears to indicate that a decreasing quality of life and a low ability to affect change in a country increases the probability of conflict. It supports the idea that Arab Spring conflict was more likely when factors affecting the population, which can be thought of as quality of life factors, were decreased over the three-year trends in the model. It also agrees that a decreased willingness of a government to concede to protesters, manifested by an inverse relationship to voice and accountability, leads to violent clashes with the population, which undermine the government and spark the violence for war.

Policy goal framework
This study accepts LSOC to be the preferred model for modeling the Arab Spring conflict and proposes LSOC as a potential modeling technique to develop a framework for decision-makers to make real policy goals to affect complex conflict environments such as the Arab Spring. This is done by calculating the amount that each variable in the LSOC model would have been needed to be adjusted to move all countries out of high-intensity, violent conflict. Using the LSOC optimal cutoff and by holding all other factors constant, Table IV shows the increase or decrease needed for each variable to force all countries out of conflict for 2011. These changes can be thought of as conflict thresholds. The smaller conflict thresholds for Syria and Yemen indicate that a smaller change in the relevant factors would have been needed to affect 2011 conflict than that of Egypt or Libya. In the world of limited resources that the US military faces, it is also useful to know which countries have the direst needs for change and which countries could be swayed from conflict with the smallest magnitude of change.

Conclusions
This study provides significant insight both to the statistical field of conflict prediction and to the political science field in the understanding of the Arab Spring. Implementation of the MICE imputation methodology using given data parametric distributions should provide realistic results for time-series, conflict data. Following this, model building testing shows that although high training set accuracy can be achieved using pure statistical methods, a hypothesis-backed model may provide similar accuracy measures and allows for greater understanding of conflict. Because of the improved accuracy measures from previous studies, this article suggests future conflict anomalies be analyzed separate from normal conflict transition eras. Politically, this research reveals contributing factors to the behavior of Arab Spring countries' conflict tendencies. It is possible that future regional conflict shocks will be focused on factors other than quality of life and government repression, which analysts should address by using this model building methodology. The factors included in models from such a methodology create more concrete goals for decision-makers to work toward for similar situations in the future. Along with understanding the pertinent factors to affect conflict in unusual conflict climates, this research suggests that through the use of logistic regression, it is possible to determine which countries can be realistically affected to decrease their conflict threat levels. With a grasp on which countries are at risk and which countries can be most easily adjusted to take them out of the at-risk classification, decision-makers can make better-informed decisions on where and how to supply resources to the region.