Likelihood and cost impact of engineering change requirements for DoD contracts

Purpose – There appears to be no empirical-based method in the literature for estimating if an engineering change proposal (ECP) will occur or the dollar amount incurred. This paper aims to present an empirically based approach to address this shortfall. Design/methodology/approach – Using the cost assessment data enterprise database, 533 contracts were randomly selected via a stratified sampling plan to build two regression models: one to predict the likelihood of a contract experiencing an ECP and the other to determine the expected median per cent increase in baseline contract cost if an ECP was likely. Both models adopted a stepwise approach. A validation set was placed aside prior to anymodel building. Findings – Not every contract incurs an ECP; approximately 80 per cent of the contracts in the database did not have an ECP. The likelihood of an ECP and the additional amount incurred appears to be statistically independent of acquisition phase, branch of service, commodity, contract type or any other factor except for the basic contract amount and the number of contract line item numbers; both of these later variables equally affected the contract percentage increase because of an ECP. The combined model overall bested current anecdotal approaches to ECPwithhold. Originality/value – This paper both serves as a published reference point for ECP withholds in the archival forum and presents an empirically based method for determining per cent ECPwithhold to use.


Introduction and background
History suggests that, by and large, the Department of Defense (DoD) and the military departments have underestimated the cost of buying new weapon systems. Arena et al. (2006) analyzed major DoD programs and discovered these experienced nearly 46 per cent cost growth before the end of Milestone B and another 16 per cent growth by Milestone C. As a recent example, this cost growth trend continued with the Joint Strike Fighter (JSF) program. As of the 2009 Selected Acquisition Report (SAR), the JSF per-unit estimate has grown 57 per cent from its initial October 2001 estimated value.
To combat and possibly militate against cost growth, Congress enacted the Weapon System Acquisition Reform Act of 2009, often called WSARA as public law 111-23. The act created a Pentagon office -Office of Cost Assessment and Program Evaluation (CAPE)to analyze the cost of defense programs. One particular factor related to cost growth are technical changes. These engineering change proposals (ECPs) can occur for many reasons. An ECP necessitates a scope change to a contract. They can be initiated by the government, the contractor or even feedback from the users.
The Government Accountability Office (GAO) found that 63 per cent of major defense programs had requirement changes after system development began [Government Accountability Office (GAO), 2008]. Additionally, those programs with requirement changes encountered, on average, cost growth of 72 per cent, while costs grew by only 11 per cent among those programs that did not change requirements. The fundamental purpose of an ECP is to change the requirements of a contract [Engineering Change Proposal (ECP) , 2017]. To build in flexibility, the acquisition practice is to estimate a dollar value to hold in reserve after the contract is awarded. This amount has several names, for the purpose of this article, we call it ECP withhold, as it is the amount of money the Government withholds for ECPs.
There are three major cost estimating guides commonly used by Air Force cost analysts today: the air force cost analysis handbook (AFCAH), the GAO cost estimating and assessment guide and the USA Air Forcecost risk and uncertainty analysis handbook (AFCRUH). Each provides overlapping material and views to best estimate cost and risk; however, none of the guides provide an empirical-based method for estimating ECP withhold. Given this lack of guidance, practitioners use common rules-of-thumb for ECP withhold.
A relatively common one among the acquisition community is that estimates may vary by 10 per cent. This is seen in several separate fields and disciplines. The starting amount for estimating management reserve is 5-10 per cent (Project Management Institute, 2017). The amount over cost for an acquisition program baseline breach is 10 per cent [Department of Defense (DoD), 2015]. Currently, the Air Force Life Cycle Management Center allots the following percentages. For development cost estimates, a 10 per cent cost is added above the total estimate, while for procurement estimates, that percentage lowers to 6 per cent (S. Valentine, personal communication, multiple dates, 2015(S. Valentine, personal communication, multiple dates, -2017. Finally, the automated cost estimating integrated tools software package uses a range between 6-10 per cent.
The purpose of this article is to present empirically based models via historical data that can be used to not only estimate the likelihood of a contract experiencing an ECP but also determine the amount of ECP withhold as a percentage of the total contract cost. In addition, the study compares these models to three alternative rulesof-thumb. The analysis presented in this paper is at the contract level and not a program level.

Likelihood and cost impact
Database and methodology CAPE's mission is to provide independent program analyses and insights as requested by the Under Secretary of Defense for Acquisition, Technology and Logistics and Congress. Additionally, CAPE reviews programs that may be, or already are, struggling in the acquisition process. To facilitate their mission, CAPE initiated the development of the cost assessment data enterprise (CADE), the Department's initiative to identify and integrate data from disparate databases and systems for better decision-making, management of, and oversight of the Department's acquisition portfolio.
As of 11 April 2017, CADE hosted a contract level database consisting of 7,343 unique contracts with details extracted from the electronic document access (EDA) system. EDA is a Web-based system that provides for storage and retrieval of not only DoD contracts but also contract modifications [Electronic Document Access (EDA), 2017]. This study uses EDA data extracted from CADE as the primary data source. This database in turn forms the basis from which to develop empirical regression models to predict not only the likelihood of a contract having an ECP but also the additional per cent increase from the ECP.
Both the large number of available contracts and the requisite time to manually check contract details necessitated using a stratified sampling plan. The preliminary intent was to randomly sample 10 per cent of the population; but from the time constraints, the resultant percentage ended closer to 9 per cent (as documented in the next section). Initial stratification included four main criteria consisting of the following contract elements: phase, schedule, size, and type. Phase is divided into development versus other (production/ operations and support). Schedule is divided into contracts that have an initial schedule longer than one year or those equal to or less than a year. Size separates contracts that have a baseline cost greater than $5m ($5,000,000) versus contracts that have a baseline cost less than or equal to $5m. Contract type splits contracts by those with greater than 90 per cent fixed firm price (FFP). An additional stratum takes into consideration very large contracts, specifically those exceeding $400m. This was because of the preliminary findings from Cordell (2017) This arrangement presents 16 possible strata from which to sample in addition to the one accounting for very large ECP contracts. As shown in the Analysis and Results section, these 17 total strata ultimately collapse to 7 for sample collection purposes from the population. Sample percentages are statistically matched to that of the population percentages to ensure these bins correspond accordingly and are checked via a paired t-test. In addition, a second paired t-test is conducted to show that the sample matches that of the population with respect to percentile distributions by type of contract, branch of service, and commodity. These inferential tests are conducted at a significance level of 0.05.
Once the stratification plan is finalized, errant contracts are filtered out prior to populating the study's database. These errors may include, for example, missing contract dates, missing contract amounts or even a negative contract award since a contract cannot possess a negative value. Other errors may also include modifications being incorrectly classified as a cost modification despite adding scope, which is an ECP by definition. In the next section, we list the main errors detected in building the modeling database and the number affected by each exclusion criteria. As mentioned previously, all dollar amounts are converted to BY 2016 dollars to account for the effects of inflation. Because of the unavailability of military appropriation category for the contracts in the database and the length of some aircraft JDAL 2,1 contracts, we used the total manufacturing producer price index as reported by the Bureau of Labor Statistics for the conversions. All analysis in this article used JMP12 Pro, Excel or R.
The models presented in this article predict two response variables. The first is a binary (dichotomous) variable for the logistic regression model. If a contract has any technical ECPs, the response is a 1. If the contract has no technical ECPs, the response is a 0. The second is a continuous variable for the ordinary least squares (OLS) model. This variable is the natural logarithm of the per cent cost growth strictly relating to changing requirements.
[Note: the highly skewed percentage growth, in addition to other skewed predictors variables, necessitated a log-log modeling approach. This is mentioned and demonstrated via a graph in the next section.] The per cent ECP growth is the sum of all modifications that are listed as a change in requirements, divided by the contract's baseline cost. The end result of the OLS model is the predicted median per cent of ECP withhold.
With respect to identifying possible explanatory variables either associated with the likelihood of an ECP occurring or the median per cent of ECP withhold, we turn to the literature. Trudelle et al. (2017); Bolten et al. (2008) and Arena et al. (2006) document several potential variables to be predictive factors for determining if a program will experience cost growth. Additionally, Harmon and Arnold (2013) performed an assessment on contracts types, attempting to understand the impact of overall contract price. They determined that for a series of production contracts in which the system design is mature and stable, the best choice of contract type is FFP. This finding played a key role in determining one of the strata in the sampling plan. Table II lists the possible explanatory variables considered in the analysis to predict the likelihood of a contract experiencing an ECP as well as the expected median percentage increase. As noted in the next section, given the large number of F/A-18E/F contracts in the population, a dichotomous predictor variable, labeled F18, has been added to statistically test if that program overly influences the modeling database either with the likelihood of a contract experiencing an ECP or the expected median percentage increase. Prior to any model building, the study's database is randomly partitioned into two components: the modeling data set and the validation data set. This is accomplished by generating a random uniform number for each contract, sorting from smallest to largest random number, and then pulling the requisite percentage as needed. For the logistic model, because of the large sample size, approximately 50 per cent of the contracts are set aside for Production or operations and support contracts 3 (Short) Initial contract duration equal to or less than a year 4 (Long) Initial contract duration longer than a year 5 (Small) Baseline contract cost equal to or less than $5,000,000 6 (Large) Baseline contract cost exceeds $5,000,000 but less than $400,000,000 7 (FFP) Total per cent of initial contract type and modification contract types greater than 90% FFP 8 (Non-FFP) Total per cent of initial contract type and modification contract types is equal to or less than 90% FFP Special Baseline contract cost equals or exceeds $400,000,000 Notes: Strata pairs 1/2, 3/4, 5/6, and 7/8 are complementary events. All dollars presented in base year 2016 values. Baseline contract cost equals initial contract cost plus all priced options Likelihood and cost impact model validation. For the OLS model, approximately 20 per cent of the contracts that experienced an ECP are set aside for validation since the sample is greatly reduced when modeling just those contracts with an ECP. None of the contracts in the validation set were used to create the respective statistical models.
Before validating the OLS model, the customary residual assumptions of normality and constant variance are tested by using the Shapiro-Wilk test and the Breusch-Pagan test, respectively. Both are conducted at the 0.05 level of significance. Additionally, multicollinearity, outliers and influential data points are investigated to prevent model bias. Variance inflation factors (VIF) highlight the linear relationship between independent variables and a VIF score higher than 5 suggests multicollinearity. Regarding outliers, any studentized residual greater than three standard deviations is categorized as an outlier and a possible source of concern. Finally, Cook's Distance detects overly influential data points possibly skewing the results. Any value greater than 0.5 is investigated closely.
To assess validity of the developed logistic regression model, the confusion matrix is used. This matrix assesses the number of true positives, true negatives, false positives and false negatives, respectively. A cutoff criterion of 0.5 is set as the prediction threshold to separate a contract into "ECP likely" vs "ECP not likely". The validity of the OLS model is assessed in multiple criteria: mean absolute per cent error (MAPE), median absolute per cent error (MdAPE), coefficient of determination (R 2 ) and adjusted R 2 . Each absolute per cent error is calculated as the absolute value of the difference between a predicted response minus the actual response divided by the actual response.
For both the OLS model and the logistic model, a mixed stepwise procedure is adopted to arrive at the models presented in the next section. A level of significance is set to 0.01 to determine initial predictive ability of an explanatory variable. From there, the preliminary selected variables are investigated to determine their practical effect on the respective model. If a particular explanatory variable is determined to have less than a 1 per cent relative effect on a particular model's response, then that variable is excluded from being in the final model that is presented for practitioners' use. This is done to minimize a variable being statistically significant but having little practical effect.
To finalize the presented results, four different methods and their recommended ECP withholds are compared descriptively. The first method uses the application of the presented regression models in this article. The second method involves having no ECP withhold (essentially assuming no additional costs for a contract). The third adopts the per cent found anecdotally in the literature: 6 per cent for development and 10 per cent for procurement contracts. The last method simply uses a flat average. This average is the average per cent ECP growth for all contracts with no discrimination between life-cycle phases and applied indiscriminately to all contracts.

Analysis and results
From an initial population of 7,343 contracts housed within the CADE database on 11, April 2017, 1,416 were excluded because of missing or erroneous data, resulting in an effective population size of 5,927 or approximately 81 per cent of the total starting number. Table III highlights the main reasons that attributed to approximately 84 per cent of the exclusions. Of the 1,416 contracts removed prior to creating a stratified random sample for the study's analysis, missing contract type associated with a contractual amount was the dominant exclusionary reason, accounting for approximately 43 per cent of the 1,416 contracts removed for consideration. The next highest reason was missing an end date to a modification of an initial contract. That reason accounted for 313 contracts or approximately 22 per cent of the excluded contracts.
With respect to the bin characteristics as discussed in the previous section, Table IV shows the population percentages, while Table V highlights the final selected seven strata percentages for both the population and sample, respectively. The paired t-test comparing the percentages of the population strata to that of the sample strata results in a p-value of 0.96, which concludes in failing to reject the null hypothesis. Therefore, at a 0.05 level of significance, the paired t-test strongly suggests that the modeling database appears statistically equivalent to the population.
Regarding acquisition phase, branch of service and commodity type, Table VI lists the percentages by both population and sample. Conducting another paired t-test comparing the Note: Percentages rounded to two decimal places Likelihood and cost impact percentages of the population strata to that of the sample strata results in a p-value of 0.99, which again supports the preceding results that the modeling database appears statistically equivalent to the population. One other noticeable conclusion from Table VI highlights a high number of Navy or aircraft contracts in the population. Delving further, a large per cent of the population contracts stem from the F/A-18E/F (Super Hornet) program only. Approximately 42 per cent of all the contracts originated from the F/A-18E/F with the second highest being approximately 4 per cent for the AWACS (Airborne Warning and Control System) program. However, in terms of total cost (sum of all contracts for a given program), the F-22 (Raptor) leads all programs and account for approximately 7.6 per cent of the total population contract cost, while the F/A-18E/F accounts for approximately 5.9 per cent of the total amount. In total, the

Logistic model
Out of the 541 contracts in the sample database, 271 were randomly set aside for the validation set. The remaining 270 were used as the modeling set for the logistic model. Customarily, an 80/20 ratio is used; however, because of the relatively large size of the sample database, a 50/50 split was preferable to allow for greater generalization testing. A relatively equal number of ECPs were in each set, which was commensurate with the overall ECP percentage of approximately 17 per cent. Prior to commencing any analysis, a histogram reflecting the baseline cost for all 541 contracts highlighted 8 contracts (approximately 1.5 per cent of the sample database) that noticeably appeared as outliers, whereas the remaining 533 contracts (which were equal to or less than $164m in Fiscal Year 2016 dollars) had a relatively smooth lognormal distribution (p-value of approximately 0.04 for the Kolmogorov-Smirnov goodness of fit statistic, which is reasonable given the large sample size). The smallest baseline cost of these eight outliers was $260m, while the largest was $2.6bn. Given the sparsity and 10-fold difference between the lowest and largest baseline contract cost of these eight, we chose at this point to remove from consideration these contracts, which in turns removes from consideration the explanatory variable of contract size extra-large (programs exceeding $400m) and limits the paper's inferential results to contracts $160m or less.
With this change in mind, the model building set reduced to 265 and the validation set lowered to 267. Table VII highlights two possible models for predicting the likelihood of a contract experiencing an ECP as selected by stepwise regression. The explanatory variable Ln (Baseline cost) [the natural logarithm of a contract's baseline cost] and the explanatory Likelihood and cost impact variables contract size large and contract size small are complementary in nature given both small and large contracts are included in the entire gambit of baseline contract costs. Therefore, stepwise only flagged one model at a time as being significant, but we chose both going forward into validation to determine which of the two might be ultimately better in predicting the likelihood of an ECP. No other explanatory variable proved statistically significant given the cost of the contract was already in the model, including the F18 variable. The overall takeaway is that the cost of the contract appears to be the overwhelmingly dominant factor in determining the likelihood of an ECP. With respect to how well both models predict a contract experiencing an ECP, Table VIII displays the confusion matrix for the model building and validation datasets. Overall, both models reflect a high predictive ability for detecting contracts that do not experience an ECP; however, both reflect quite poorly at predicting ECPs, with Model 1 displaying some ability to predict the true likelihood of an ECP. Model 2 (as evident from Table VII) reflects that a breakpoint potentially occurs somewhere in the lower cost spectrum for contracts equal to or less than $100,000 have a lower chance of incurring an ECP compared to contracts greater than $5m. In fact, of the 533 contracts in the sample database whose baseline cost is less than or equal to $100,000 (171 in total), only one contract had an ECP. We use this information later in terms of overall model application.

OLS model
The linear (OLS) model is designed to predict the amount of cost growth solely attributable to ECPs. This response is in the form of the natural logarithm of the percentile increase. To revert to the actual expected percentage, one would take the natural exponent of the predicted value, which results in the expected median percentile increase due to incurring an ECP. As mentioned in the previous section, contracts that had a net negative ECP growth are not considered. This occurs whenever a contract de-scopes effort and cannot be used to obtain an accurate ECP withhold before contract award.
Of the 541 contracts in the original sample database, 99 experienced a contract increase because of an ECP. Of these 99, 20 were initially randomly set aside for the validation set, while the remaining 79 were used to develop the OLS model. After accounting for the eight contracts excluded because of a very large baseline contract cost (exceeding $164m), these numbers changed to 71 for the modeling dataset with 20 remaining in the validation set, respectively.
Given the highly skewed pattern of ECP percentage increase, basic contract cost, baseline contract cost and the schedule (in days) of the contract, all of these variables were transformed via the natural logarithm function. As an example of this, Figure 1 displays the typical skewed right pattern of the basic contract cost in addition to its distribution after the transformation.  Table IX highlights the preliminary model for predicting the expected natural log percentage increase in contract cost because of an ECP. The model has an R 2 of 0.37 and an adjusted R 2 of 0.35, respectively. No other explanatory variable proved statistically significant after accounting for the cost of the basic contract along with the number of CLINs truncated at five (that is five or more CLINs are grouped into the five CLIN group). In no iteration did the F18 variable prove statistically significant, similar to the findings of the logistic model. The candidate model in Table IX passed all model diagnostics with no issues with multicollinearity (largest VIF score of 1.13), outliers (largest studentized residual value of 2.5), influential datapoints (largest Cook's D value of 0.13), normality (Shapiro-Wilk test p-value of 0.47) or constant variance (Breusch-Pagan test p-value of 0.50).
For the OLS model, for both the modeling and validation datasets, the MAPE and MdAPE are 86 per cent and 35 per cent and 130 per cent and 36 per cent, respectively. The higher MAPEs in comparison to the lower MdAPEs reflects moderate outliers in both datasets, while the relative comparable MdAPEs are a better measure of the model's consistency and generalization. Table X highlights the final empirical model after updating with the validation dataset and used in conjunction with the results of the previously presented logistic models. Equation (1) represents the user model in mathematical form, taking into consideration back-transformation. Note: a user would only use equation (1) for any contract whose baseline cost exceeds $100,000 but is less than or equal to $164m (both in BY $16). Otherwise, the expected median percentage would be zero given a very, very low chance of lower cost contracts (equal to or less than $100,000) incurring an ECP: Expected percentage contract cost ¼ e 3:32À0:30Ln BasicCost ð Þ À0:33 CLINsonBasicTruncatedat5 ð Þ (1) experiences an ECP. The second method, referenced in the literature, uses 6 per cent for development and 10 per cent for procurement contracts. The last and third method uses a simple flat average applied indiscriminately regardless of phase. The average ECP withhold of the 533 contracts in the sample database is approximately 5.9 per cent. This is the value used for the flat average method. Based on the average ranking of best (1) to worst (4), equation (1) appears to be the best method of the four presented with an average rank of 1.8. All methods used the true final cost of a contract to determine the percentages and amounts. Final cost equaled total baseline cost in addition to any ECP amounts documented.

Discussion and conclusion
To the best of our knowledge, no peer-reviewed source could be found that documents the amount of ECP withhold that should be set aside for DoD contracts. Only anecdotal amounts were present in the literature. The aim of this paper served dual purposes: one, as a published reference point for ECP withholds in the archival forum; and two, derive an empirically-based method for determining percent ECP withhold. Based on the analysis presented, several points became evident. One, not every contract incurs an ECP; however, ECPs do occur and not budgeting accordingly results in a serious shortfall as shown in Table XI. Two, both the likelihood of an ECP and the additional amount incurred appears to be statistically independent of acquisition phase, branch of service, commodity, contract type or any other factor except for the basic contract amount and the number of contract line item numbers (CLINs). Both of these variables equally affected the contract percentage increase due to an ECP. Finally, the logistic regression approach proved a poor predictor of determining the likelihood of a DoD contract incurring an ECP. However, it did provide invaluable insight that lower cost contracts appeared statistically less likely to incur an ECP. Preliminary analysis suggests that this breakpoint might be around $100,000; however, future research is encouraged to further delve into this lower boundary.
As with any research, limitations do exist for the results in this paper. Quality statistical analysis depends on quality data. Therefore, any errors within CAPE's database pulled from EDA will pass down to the sample database that formed the conclusions stated in this paper. Additionally, the encouraged use of equation (1) requires a portfolio managed approach to contracts in an organization. That is, an agency or manager overseeing a multitude of contracts is able to move ECP withhold amounts from contract to contract as Likelihood and cost impact needed. In that context, the OLS model as shown in Table XI represents an almost balanced approach. Finally, using equation (1) for contracts exceeding $164m in BY 16 dollars would be model extrapolation, and we caution against such use. The field of changing requirements and their impact at the contract level is full of opportunity. One major recommendation we provide is to use or add a different source of data. Adding data from the SARs might provide details on program elements that might increase the chance of all contracts within that program experiencing an ECP. Another source of information is EVM reports. Not only could the EVM metric provide a snapshot of contract health but also research has shown that sentiment analysis of EVM/Status Reports of programs might provide insightful information and prediction capability (Freeman, 2013).
Finally, we suggest simultaneous analysis at the contract and the program level. From our experience, many programs will start a new contract rather than adding requirements on an existing contract. This practice, while valid and legal, may skew the analysis if performed solely at the contract level. That is, a program might experience cost growth by adding new contracts, while existing contracts show no increased cost. Overall, a broader and more holistic view is needed to accurately assess the impact of changing requirements for the final cost of a DoD program and elements that affect its bottom-line.