Analyzing operating and support costs for Air Force aircraft

Purpose – Recent legislation resulted in an elevation of operating and support (O&S) costs ’ relative importance for decision-making in Department of Defense programs. However, a lack of research in O&S hinders a cost analyst ’ s abilities to provide accurate sustainment estimates. Thus, the purpose of this paper is to investigate when Air Force aircraft O&S costs stabilize and to what degree. Next, a parametric O&S model is developedto predictmedian O&Scosts foruse as a newtool forcostanalystpractitioners. Design/methodology/approach – Utilizing the Air Force total ownership cost database, 44 programs consisting of 765 observations from 1996 to 2016 are analyzed. First, stability is examined in three areas: total O&S costs, the six O&S cost element structures and by aircraft type. Next, stepwise regression is used to predictmedian O&Scosts per totalactive inventory (CPTAI) andidentifyin ﬂ uentialvariables. Findings – Stability results vary by category but generally are found to occur approximately ﬁ ve years from initial operating capability. The regression model explains 89.01 per cent of the variance in the data set when predictingmedian O&S CPTAI. Aircraft type, location of lead logistics center and unit costare the three largest contributingfactors. Originality/value – Results from this research provide insight to cost analysts on when to start using actual O&S costs as a baseline for estimates in lieu of analogous cost program data and also derives a new parametricO&Sestimating tool designedasa cross-check tocurrentestimatingmethodologies.

procurement) phases of a program's life cycle (Ryan et al., 2013). WSARA, however, brought incipient attention and emphasis to the operating and support (O&S) phase of the life cycle (Congress US, 2009). To be clear, it is not the requirement to develop and complete O&S cost estimates that has changed. Rather, it is the elevation of the relative importance of the O&S estimates for decision-making in relation to its acquisition counterparts that has changed. This historically dichotomous relationship is mirrored in the academic literature. In fact, an analysis of the DoD cost literature shows that acquisition cost behavior was the focus of over 130 studies from 1945 to 2009, while published analysis of O&S cost behavior during the same time rarely occurred (Ryan et al., 2012). Since the enactment of WSARA in 2009, however, a burgeoning O&S research stream has appeared in the literature as demonstrated in the works of Boito et al. (2012), Jones et al. (2014) and Ritschel and Ritschel (2016), amongst others.
Additionally, the practical importance of accurate O&S estimates is manifesting itself in programmatic decisions. For example, the decision to divest (or not) the U-2 in favor of the RQ-4 Global Hawk is framed not only by relative capability performance of the platforms but also by their expected O&S costs. The significance of cost in the decision process is clear: "as costs for operating the Global Hawk fell below those of the U-2, the service chose to go with the unmanned platform going forward" (Pomerleau, 2016). While the final decision on retiring the U-2 is in flux, as increased defense funding in the 2017 budget has resulted in both platforms being retained, the importance of O&S costs to decision-makers is evident.
Given the extant need for credible O&S estimates and their known historical inaccuracies (Ryan et al., 2013), we seek to provide improvements in the field of O&S estimating. Cost analysts use a range of estimating techniques, including parametric, analogy, engineering build-up or extrapolation from actuals (actual costs). As data become available and programs mature, analysts shift from techniques such as the analogy method to the extrapolation from actuals method (Mislick and Nussbaum, 2015). Current practice for O&S estimates relies heavily on the analogy method. The development of the Air Force Total Ownership Cost (AFTOC) database in 1996, which collects expenditures on the O&S costs of DoD platforms, provides a relatively recent repository that cost analysts can use to migrate from an analogy technique to the extrapolation from actual method. But when can analysts make the transition? To transition to utilizing actual data, cost analysts strive to understand when a steady-state is achieved. Thus, we seek to answer this question by investigating O&S cost stability properties in US Air Force aircraft platforms.
While determining stability properties helps cost analysts working estimates on mature platforms, there is also a large segment of analysts performing estimates at varied stages in a program's life cycle. Therefore, the second part of the research develops a regression model (including the results of the stability analysis as an explanatory variable) to predict median O&S costs. The intent of this parametric model is not necessarily to replace current practices. Rather, the model is intended as a secondary technique, or crosscheck, to the analyst's primary methodology. This fulfills a critical role as cost analysts typically use multiple techniques to garner confidence in the final estimate (Government Accountability Office, 2009). The desired result is convergence around a similar number, thereby providing the decision-maker with the most reliable cost estimate possible.
The literature reveals limited inquiries into stability properties but does provide identification of potentially important independent variables for inclusion in the parametric model developed in this research. Jones et al. (2014) examine the ratio of O&S costs to acquisition costs. A methodological assumption in their analysis is that O&S costs are deemed stable once 10 per cent of the planned procurement quantity are produced. The 10 per cent inclusion criterion is used to simplify their data set; however, there is no data-driven Analyzing operating and support costs evidence to validate the claim. Dixon (2005) analyzes the effects of aging on commercial aircraft and finds sustainment results in three distinct phases: newness, mature and aging. Dixon discovers that the newness phase ends at approximately five-seven years, which may indicate the initiation of stability. Although commercial aircraft and military aircraft have distinctly different objectives, there are parallels to the analysis, and age is included as a variable in our model. While stability has not been explicitly analyzed in the O&S arena, it has been examined in earned value management (EVM) data. Christensen and Payne use the cost performance index in EVM as a tool to gauge stability in DoD contracts (Christensen and Payne, 1992). Petter et al. (2015) add to this research by investigating earned schedule and its application to stability while analyzing three definitions of stability. While EVM focuses on contract performance and is inherently different than O&S's perpetual nature, the methodologies provide an insight on how to approach the determination of stability for O&S costs.
In 2001, the Congressional Budget Office (CBO) developed a rudimentary regression model to predict O&S costs that uses variables from a 1990 RAND study (Hildebrandt and Sze, 1990). The 2001 study uses the nascent AFTOC database to create the model. The AFTOC database originated in 1996 resulting in only four years of data in their analysis (CBO, 2001). Key variables identified in the model include age, tempo and unit cost. Incorporating additional years of data and adding programmatic information provides an opportunity for a more robust model to be built.

Database and methodology
The primary data used in this research include annual O&S costs for 44 aircraft programs (765 observations) gathered from the AFTOC database. Research is limited to aircraft in the Air Force inventory and expenditures from 1996 to 2016 because AFTOC does not contain data prior to 1996 or expenditures from 2017 at the time of this analysis. AFTOC provides costs in base-year dollars and then-year dollars. For this research, all costs are converted with Office of the Secretary of Defense (OSD) inflation indices to base-year 2016 to remove the effects of inflation. Initial operational capability (IOC) dates are needed for standardization purposes and are found using selected acquisition reports. To ensure robustness of programs selected for analysis, they must have a total active inventory (TAI) of over 10 in at least one year of cost data, at least five years of cost data for a mission design series (MDS) to be included and an available IOC year. In the data gathering process, some MDS (variants) shared IOC dates which necessitated grouping. Initial screening of the base data set is shown in Table I, while Table II highlights the 44 programs used for analysis grouped by aircraft mission type from AFTOC.
Once data are collected and screened, costs are standardized by calculating cost per flying hour (CPFH) and cost per total active inventory (CPTAI) to control for the variance in hours flown or number of aircraft in the inventory. The three cost types explored are total O&S costs, CPFH and CPTAI. "Time from IOC" is standardized by subtracting the IOC year of a certain MDS from the given year of O&S cost data. For simplicity, the IOC year is looked at, not the month. Data ranges from 1996 to 2016, but the aircraft included are at different points in their sustainment phase. For example, in 2010, the C-17A is 17 years from IOC, while the F-22A is only 5 years from IOC. When all data points are compiled, trends can be seen in terms of years from IOC for a specific program. The determination of stability for cost analysts relies on a year-to-year per cent cost difference for each year from IOC. In the analysis, the six cost element structures (CES) from the OSD, cost assessment and program evaluation (OSD CAPE) cost estimating guide, shown in Table III, are used to highlight differences in stability properties by CES. Total O&S costs are an aggregate of all six CES, but only the first five CESs are analyzed individually because of the variance in how indirect costs (CES 6) are levied on different  Sustaining support Cost of system support activities that are provided by organizations other than the system's operating units 5.0 Continuing system improvements Cost of system hardware and software modifications 6.0 Indirect support Cost of support activities that provide general services that lack the visibility of actual support to specific force units or systems. Indirect support is generally provided by centrally managed activities that provide a wide range of support to multiple systems and associated manpower Analyzing operating and support costs programs. Indirect support costs are installation and personnel costs allocated on "a per capita or some other basis," which can be ambiguous for cost analysts (OSD CAPE, 2014). Once individual differences (percentage) are calculated for each year from IOC for all 44 programs, the mean difference for each year from IOC is determined.
Truncating O&S years beyond 40 occurs because of the reasonable assumption of how long an MDS will be in operation. The 44 selected programs have years of data ranging from 1 to 57 years from IOC. This research aims to determine when stability occurs within the expected service life. Therefore, OSD CAPE's largest notional service life duration, which between fixed wing and rotary aircraft is 40 years, is the cutoff for truncation (OSD CAPE, 2014). This approach is consistent with the prior literature on O&S costs. Jones et al. (2014) used the OSD CAPE draft estimating guide when finding the ratio of O&S costs to development costs and applied services lives of 20-30 years for fighters and helicopters and 30-40 years for cargo, bomber and tanker aircraft. Given these two determinations of service life, separate analyses are conducted for up to 30 years from IOC and 30-40 years from IOC. This has the added benefit of examining potential aging effect in the data. The base data set has 765 programs and year pairs (e.g. F-15A in 1998 or F-22A in 2008), while the truncated set has 681, which means that only 10.98 per cent of the data points are removed because of truncation.
The degree of stability is determined through testing per cent bounds. The process starts at a 20 per cent bound and finds the first year from IOC in which the mean per cent difference crosses this threshold. That point is then declared the "year from IOC" that stability occurs. The duration of time it stays within the 20 per cent bound is calculated from the stability point to 30 years and the stability point to 40 years. If the mean per cent difference falls within the 20 per cent bound for more than 80 per cent of the time from the stability point to 30 years, then the same process is repeated with a 15 per cent bound. The process is repeated for the 10 and 5 per cent bounds if applicable, with the most restrictive bound being the best point of stability. This process is completed for all 18 cost combinations comprising the three cost types (total costs, CPFH and CPTAI) and six cost categories (Total O&S and CES 1-5). Again, CES 6 is used in the calculation of total O&S but not individually analyzed. The per cent stable to 40 years is not used in the selection of the best bound, but a decrease in per cent of the time stable can reveal potential aging affects.
After the best bound selection, there is a stability point in terms of years from IOC and a per cent bound. Next, descriptive statistics are calculated for the mean per cent differences from the stability point to 30 years from IOC. Using the best bound and descriptive statistics, it can be determined which cost type and which categories exhibit the most or least amount of stability. Understanding these steady-state properties, therefore, provides an insight to cost analysts on the proper time to migrate from the analogy technique to an extrapolation from actual method. The bounds also influence the cost risk modeling of the individual elements when developing a cost estimate. Descriptive statistics for 30-40 years from IOC are also calculated to highlight any aging affects.
This process is then repeated for each aircraft type. Aircraft are divided into the eight operational mission categories given by AFTOC: bomber, fighter/attack, helicopter, reconnaissance, special duty, training, transport/tanker and unmanned aerial vehicle (UAV). It is important to note that splitting the 44 programs into eight categories drastically reduces the number of data points for each aircraft type. Because of the small sample size, inferential tests are not conducted to demonstrate differences in aircraft type, but descriptive statistics can still provide insights. For simplicity, comparisons by aircraft category use the best metric for stability found at the aggregate level (i.e. CPFH, CPTAI or total costs).
The second part of this research creates a top-level regression model as a crosscheck to the cost analyst's primary estimation techniques for O&S costs. The dependent variable in this regression analysis is the total CPTAI. AFTOC provides additional programmatic information in addition to the cost data that are used to create independent, predictor variables. Examples of programmatic data provided in the database include average age of aircraft, location of lead logistics center, operational mission type and unit-cost, among others, discussed in the next section. Stability analysis results are also incorporated into the model as an additional predictor variable for the point of stability in terms of years from IOC.
Regression analysis requires a re-screen of the initial data set because of the addition of variables. Each program and "years from IOC" pair must have only one value for each potential predictor variable. This screen resulted in five grouped programs that have two different unit-costs and one grouped program with two different logistics centers. C-141B/C, KC-135E/R, MH-53J/M, T-38A/C and UH-1H/N are all programs that had major overhauls at some point in their service life that created a new MDS and an updated unit cost. Because of the programs being previously grouped by IOC year, these five are removed from regression analysis for having two unit-costs. The program with two logistics centers is the EC-130E/H and is included in the regression data set because location of logistics center is a binary variable that may influence the O&S cost. Table IV illustrates the re-screen of the base data set.
Prior to any model building, the re-screened regression database is separated into two groups: the model building data set and validation set. For multiple regression, a random sampling of the data uses approximately 80 per cent for the model building portion and 20 per cent for the validation set. Variables may or may not be included in the model depending on the individual significance and contingent on passing regression diagnostics. The mixed stepwise function in JMP Pro Version 12 is used to determine which predictor variables are initially included in the model. A significance level of 0.05 is the threshold to enter or exit the model. The assumption of normality is assessed using the Shapiro-Wilk test, while the assumption of constant variance is assessed with the Breusch-Pagan test. Both tests are conducted at the 0.05 level of significance. Failing to satisfy the assumption of constant variance can potentially be remedied with natural log transformation of the dependent and/ or independent variables. The natural log transformation results in the predicted model value being the median response rather than a mean.
Furthermore, multicollinearity, influential data points and outliers are investigated to prevent additional bias. Variance inflation factors (VIFs) highlight linear relationships between two variables and a VIF higher than 5 indicates multicollinearity. Cook's distance detects influential data points that could be skewing the model, and any value greater than 0.5 is investigated thoroughly. Any studentized residual that is greater than three standard deviations from the mean is considered an outlier and must also be investigated further. The final assessment is the Bonferroni-Holm correction, which aims to control the family-wise error rate and reduces Type I error (false positives). Analyzing operating and support costs Once all diagnostics are passed, the model building set is validated against the validation set using multiple criteria: mean absolute percentage error (MAPE), median absolute percentage error (MdAPE) and adjusted R 2 . When the model is deemed internally valid, then all data points are combined to update the final model using the variables selected from the model building process.

Stability analysis
The stability analysis begins by examining the mean per cent differences for CPTAI at each year from IOC, up to 40 years (Figure 1). The number of programs used in the calculations of mean per cent difference for a specified "years from IOC" range from 8 to 24. The table displays total CPTAI and the five O&S CESs CPTAI. Mean values are color coordinated from red to green at 5 per cent bins, where dark red is equivalent to greater than 20 per cent difference and dark green is equivalent to under 5 per cent difference. Tables for total O&S cost and CPFH are attached in the Appendix and demonstrate similar characteristics. Note that CES 4 and 5 do not fall under the 20 per cent threshold at any point and are, therefore, removed from further analysis. While it may seem capricious to remove two of the five categories, O'Hanlon (2018) determined that CES 1, 2 and 3 constitute the majority of O&S costs with a mean of 82.39 per cent and median of 80.53 per cent.
The best bound selection process and descriptive statistics for all combinations (minus CES 4 and 5) are summarized in Table V. Looking at the table, several conclusions can be drawn from the data. For instance, CES 1 (Manpower) and 2 (Unit operations) have much lower means and medians than CES 3 (Maintenance). This indicates that manpower and pay are fairly stable. Operation tempo and fuel consumption is usually planned in advance and may contribute to unit operations' costs having a more consistent nature as well. Maintenance, especially unplanned maintenance, is much harder to predict, and that may explain why less stability is exhibited in CES 3 across all cost combinations. The cost type (total, CPFH or CPTAI) that has the lowest best bounds is CPTAI which displays lower means and medians than the other two cost categories. In addition to the conclusions across cost type and cost categories, there appears to be aging affects in total costs and CPTAI but not for CPFH combinations. Aging can be seen when the per cent stable to 40 years from IOC is less than per cent stable to 30 years from IOC. While it does not show the direction in which costs are going, it does show that there is less stability later in the sustainment phase. In addition, the mean and median values for CPFH are higher, which indicate instability.
The last segment of stability analysis searches for differences across aircraft categories. The sample size to calculate means drops sizably when splitting the 44 programs into eight different categories, which limits the fidelity of inferential statistics. In addition, not all aircraft platform types have data points for all years from IOC in between 1 and 40. For this reason, bounds testing and descriptive statistics provided for the various aircraft platform types are catered to the available data of the aircraft type. CPTAI was the best metric in the aggregate analysis, with stability demonstrated at the 10 per cent bound, a trend that appears the same across all platform types. For this reason, the research hones in on CPTAI as the basis for category comparison. Besides helicopters, which has its first year of data 11 years from IOC, all aircraft types exhibit stability properties till five years from IOC. As shown in Table VI, there are potentially three distinct groupings for the means of per cent differences after stability occurs. Bombers, fighter/attack, training and transport/tanker have total CPTAI means around 7 per cent and medians from 5 to 7 per cent. Reconnaissance and helicopters are at 8.55 and 9.55 per cent, respectively, with medians around 7 per cent. Special duty aircrafts and UAVs appear to be different from the rest in the total CPTAI category with means around 11-12 per cent and medians from 10 to 12 per cent.

Multiple regression analysis
The second portion of analysis creates a regression model by utilizing knowledge from the aforementioned CBO/RAND study and builds upon it with additional programmatic information from AFTOC and results from the stability analysis of this research. The    dependent variable in the model is the natural log of total O&S CPTAI. When running diagnostics, the Breusch-Pagan test for constant variance repeatedly failed; however, natural log transformation of CPTAI and two predictor variables provided a more constant variance of residuals. The initial independent variable set to be investigated is shown in Table VII.
Of the independent variables tested in the model building set, two binary variables were removed for insignificance during the step-wise process at the 0.05 level: contract vs organic logistics support and all binary variables for years from IOC. Of the remaining 13 predictor variables, all exhibited individual significance below the comparison-wise error rate of Derived from the stability portion of the research and indicates whether the program is considered stable or not. Metric used is 5 years from IOC for CPTAI Findings from stability analysis Analyzing operating and support costs 0.05/13 = 0.00384 for the Holm-Bonferroni correction. For the 80 per cent model building set, the MAPE is 0.0131 (1.31 per cent) and the MdAPE is 0.0095 (0.95 per cent). In the 20 per cent validation set, MAPE is 0.0108 (1.08 per cent) and MdAPE is 0.0071 (0.71 per cent). The adjusted R 2 values for the model and validation sets are 0.8878 and 0.902, respectively. The close similarities in absolute percentage error and adjusted R 2 values suggest a valid model.
Combining all data points, the updated model has an R 2 value of 0.8901, which means that the model explains 89.01 per cent of the variance in the data set when predicting median total CPTAI. The model's largest VIF is 6.60 for the binary variable representing Warner Robins as the lead logistics center. All remaining values are below 5; however, this indicates there may be some minor multicollinearity in the model. The highest Cook's D is 0.367, which is below the 0.5 threshold for influential data points. The Shapiro-Wilk test fails the normality test with a p-value of <0.001, and the Breusch-Pagan test fails the test for constant variance with a p-value of <0.001. Though the tests fail statistically, a more indepth examination of the studentized residuals shows that values centered around the mean and the residual vs predicted plot show no apparent trend. Ordinary least squares regression is also robust against minor or moderate deviations from constant variance and normality (Cohen et al., 2003). Natural log transformation of the response variable as well as two predictor variables, unit-cost and tempo, provided a more constant variance of residuals. Final regression results are shown in Table VIII. The MAPE for the final model is 0.0126 (1.26 per cent) and MdAPE is 0.00913 (0.91 per cent), which indicates the model has a low relative error when predicting the natural log of total CPTAI.

Discussion and conclusion
This research determines to ascertain where stability occurs in O&S costs, and how it is exhibited using a variety of metrics. Overall, we find that the CPTAI metric is the best metric when seeking to determine stability. The cost type that exhibits less stability is the CPHF metric, which could be because of the high variance in flight hours and differences in mission types. Recall that the OSD-CAPE CES (Table III) is the structure used by analysts in constructing their estimates. Analysis of the CES illustrates that manpower (CES 1) and unit operations (CES 2) exhibit the highest degree of stability, while maintenance (CES 3) is the lowest. At the top level, CPTAI reaches stability five years from IOC and coincides with  Dixon's (2005) research on commercial aircraft exiting the newness phase when average aircraft age is five-six years. Dixon's research is standardized by average age and CPHF but it presents similar results to our analysis. Our categorical analysis (using CPTAI as the metric) finds two-three distinct groupings, with stability typically occurring from 0 to 5 years from IOC. Practical applications of these findings for DoD cost analysts and decisionmakers is straightforward. Programmatic decisions, such as the aforementioned U-2 vs Global-Hawk decisions, are based partially on out-year sustainment costs. Understanding the stability properties of these costs provides an increased confidence in the estimates provided and the decisions upon which they are based.
In addition to analyzing stability properties, this research develops a robust regression model to predict median total O&S costs for aircraft. It is important to note that this tool is intended to be used as a crosscheck in conjunction with primary methods when creating estimates. The model has an R 2 of 0.8901, which is a significant explanation of the variance in the data. Table IX displays the relative contribution for the variable types as determined by standardized beta estimates. Aircraft type, location of lead logistics center and unit cost are the three largest contributing factors to the model and comprise over 74 per cent of the relative weights in the total model. Standard practice is for analysts to use a multitude of techniques to enhance confidence in the estimates provided. The model developed here serves as another technique for the analyst's toolkit.
There are several limitations to this research. Perhaps the most significant limitation in determining stability is that the research standardizes by years from IOC. In theory, it is a reasonable way to standardize, but in reality, programs distinguish initial capability differently. Programs may declare IOC earlier than when they should because of schedule or other pressures. In other cases, programs incur O&S costs for multiple years before declaring IOC which skews the data to show stability sooner. This limitation is well documented in other works such as Kozlak et al. (2017) and Jimenez et al. (2016), who also used IOC dates as a standardization point. A second limitation with respect to IOC is that the year is used rather than the month for IOC dates for simplicity reasons. Finally, our investigation is limited to only Air Force aircraft. While the stability results coincide with Dixon's (2005) research with commercial aircraft, it is impossible to tell if these characteristics are the same as those of other service aircraft or other types of programs.
There are multiple areas where future research is warranted. Stability properties of other military services weapon systems could be investigated using the methodology of this paper. The visibility and management of operating and support costs database contains the necessary O&S costs for other services. Additionally, an investigation into CES 6, indirect costs, to develop an allocation model would allow for future researchers to use this CES in their analysis. Other future research could examine O&S cost property behavior in times of war in comparison to peacetime. Analyzing operating and support costs The new emphasis placed on understanding O&S costs from the WSARA legislation has resulted in a nascent research stream that is growing. This research adds to the body of knowledge by providing cost analysts and decision-makers with tools to help determine when Air Force aircraft can be considered to have stabilized O&S costs. A determination of stability allows for better budgeting practices because cost analysts can then switch from using analogous programs to extrapolating from actual data. In addition, we provide a new parametric model that predicts median O&S costs for programs in varying stages in the life cycle. Convergence around an estimate from multiple techniques provides confidence in the decision upon which they are based. Figure A1. Analyzing operating and support costs