Statistical validation of critical aspects of the Net Promoter Score

Purpose – Although the Net Promoter Score (NPS) index is simple, NPS has weaknesses that make NPS ’ s interpretationmisleading.Themaincriticismisthatidenticalindexvaluescancorrespondtodifferentlevelsof customerloyalty.Thismakesdifficulttodeterminewhetherthecompanyisimproving/deterioratingintwodifferentyears.Theauthorsdescribetheapplicationofstatisticaltoolstoestablishwhetheridenticalvalues may/maynotbeconsideredsimilarunderstatisticalhypotheses. Design/methodology/approach – Equal NPSs with a “ similar ” component composition should have a two-way table satisfying marginal homogeneity hypothesis. The authors compare the marginals using a cumulative marginal logit model that assumes a proportional odds structure: the model has the same effect for each logit. Marginal homogeneity corresponds to null effect. If the marginal homogeneity hypothesis is rejected, the cumulative odds ratio becomes a tool for measuring the proportionality between the odds. Findings – The authors propose an algorithm that helps managers in their decision-making process. The authors ’ methodologyprovidesastatisticaltooltorecognizecustomerbasecompositions.Theauthorssuggest a statistical test of the marginal distribution homogeneity of the table representing the index compositions at two times. Through the calculation of cumulative odds ratios, the authors discriminate against the hypothesis of equality of the NPS. Originality/value – The authors ’ contribution provides a statistical alternative that can be easily implemented by business operators to fill the known shortcomings of the index in the customer satisfaction ’ s context. This paper confirms that although a single number summarizes and communicates a complex situation very quickly, the number is ambiguous and unreliable if not accompanied by other tools.


Introduction
Customer satisfaction and retention are very important factors for companies that work in increasingly competitive markets.Following Arora and Narula (2018), "Customer satisfaction is mainly derived from the physiological response with the perceptual difference gap between expectation before consumption and practical experience after consumption of service or products.It implies an accumulated temporary and sensory response." The literature is full of proposals for methods to measure customer satisfaction; see, among others, Ngo (2015).Measurement can be approached through the use of various models and methods, of which the best known are Net Promoter Score (NPS), National Customer Satisfaction Index (NCSI), American Customer Satisfaction Index (ACSI), European Performance Satisfaction Index (EPSI), Service Quality (SERVQUAL), probit/logit model, Multicriteria Satisfaction Analysis (MUSA) and statistical regression models based on latent variables.Note that many of these approaches may also involve the use of articulated questionnaires.
In governance and marketing processes aimed at maximizing a company's success, customer loyalty is of paramount importance, a process that is closely linked to customer satisfaction.In fact, these processes have an impact on satisfaction, and satisfied customers become loyal ones (Arora and Narula, 2018).Measuring the level of satisfaction of a customer with statistical models can be very complex and difficult (Zanella, 1998;De Luca, 2006).The models normally used may not be easy to implement.The variables that govern the mechanisms of customer choice and satisfaction are generally very difficult to measure and model.
Furthermore, the quest to consolidate the company's position in the market and win more market share cannot be separated from the need to understand what the customers want.Their needs change over time, as do their requirements, and this pushes the companies toward a continuous search for improvement as indicated by the philosophy of Total Quality Management (TQM).TQM is a quality-based strategic tool of management and characterizes the basis for successful organization that ensures the success of organizations in the competitive economy.If TQM is effectively evidenced in the quality of the product, customer loyalty is automatically enhanced, Worlu et al. (2019).Deming (1986) perceives TQM as a set of management practices that enable companies to increase their productivity and quality by having the ability to create constancy of purpose for improving products and services and stop reliance on inspection to attain quality.The Plan-Do-Check-Act (PDCA), also known as the "Deming wheel", had its origin with Deming's lecture in Japan in 1950 by modifying the Shewhart cycle introduced in 1939.The PDCA cycle (Figure 1) is a widely utilized management methodology in those companies aiming at continuous improvement.
In this context, the customer satisfaction methodologies already indicated also fit in, as the NPS does.The NPS, introduced by Reichheld (2003) and then revised by himself (2011), fits in as a new resource that is agile to use (it is based on a single question) and, above all, that leverages the word-of-mouth (WOM).Loyalty is reflected when customers say positive things about the firm, intend to do business with the company and consider that particular company their first choice.In an increasingly globalized world where e-commerce is expanding rapidly, WOM seems to be a winning aspect for companies that increasingly rely on asking their customers and buyers for ratings to be published online.
The basis of the NPS is the idea that a satisfied customer would be willing to recommend the brand to friends and acquaintances.Reichheld believes that WOM recommendations are a useful, powerful and simple tool for measuring the degree of success of a brand and the degree of its customer loyalty.
The customer is asked a single question: "How likely is it that you would recommend us to a friend or colleague?"The response uses a scale of 11 points, from 0 (indicating "I probably won't recommend it") up to 10 (indicating "I will most likely recommend it").The NPS takes into account the responses to this single question.In fact, Reichheld maintains that a higher level of customer satisfaction, and consequently loyalty, will result in a higher score in response to the question.
The scale is divided into three clusters: scores of 9 and 10 indicate clients considered promoters, scores of 7 or 8 are considered neutral or passive clients and scores of 6 or lower are considered detractors (see Figure 2).
Of the three groups of scores identified, only two are used to calculate the NPS: The NPS measure theoretically ranges from À1 (no promoters and all respondents are detractors) to þ1 (all respondents are promoters), although typical values are in the range 0.3-0.4.Obviously, the value can be read as a percentage.
The simple nature of the NPS index has made it very popular and widely used, but it has also generated considerable disagreement.It is clear that this measure has pros and cons.
This paper does not have the ambition to provide a new tool that can replace the wellknown NPS, but it instead focuses attention on the indiscriminate use of the score.The aim of the work is to present a statistical methodology already known in the literature (the marginal homogeneity model and the cumulative odds ratio) that, when combined with the NPS index, allows a correct reading of its value.Although the proposed method does not correct the known structural weaknesses of the index, it allows us to begin to answer some of the criticisms raised by allowing an objective reading (see Subsection 1.1).

Net Promoter Score critical issues
The introduction of the NPS index, in spite of considerable criticism in the scientific community, turns out to be a tool that is easy to implement even by those without specific statistical knowledge.For business operators, the evaluation of the number of potentially satisfied customers (promoters) is easy.Their satisfaction is measured indirectly through the score they give to the possibility of suggesting the brand/product to other possible buyers.This mechanism is believed to trigger a growth/decline process of the company's image on the market with the consequent increase/decline of customers.
Reading this index should help the company to understand not only its position in the market relative to its competitors, but also whether its position has been improving (detractors or passives becoming promoters) or worsening (promoters moving to the position of detractors or passives).Extensive debate has been conducted in the literature regarding the fact that the so-called move from one "state" to the next may not be easy to detect, i.e. the indicator does not provide any insight into the decision-making process or the motivation for the customer to move from one state to the next.Ultimately, there is little doubt that a detractor is unlikely to become a promoter.It is more reasonable to expect that it is the passives (ignored in the calculation of the NPS) who can change the state by altering the state of affairs and the value of the index.

Validation of NPS's critical aspects
Apple, Amazon, American Express, Avis, HP, Sky and IBM are among the many prominent adopters of NPS.The benchmark is popular for its simplicity, and Reichheld claims it correlates to company growth.Critics contend that this is not the case (Sharp, 2006;Pingitore et al., 2007;East et al., 2011;Eskildsen and Kristensen, 2011;Kristensen and Eskildsen, 2014).In particular, the 11-point scale is argued to have lower predictive validity than other scales (Schneider et al., 2008), the segmentation of promoters/passives/detractors is arbitrary and other questions may be better predictors of growth rates as reported by Jeff Sauro [1] and Richard Evensen [2] in their blogs: (1) The single question is not the most important in terms of customer satisfaction: this means that the NPS is surely less accurate than composite customer satisfaction indices based on, for example, three questions; (2) The NPS does not accurately differentiate promoters and detractors: the composition of the three classes proposed by Reichheld is not supported statistically; (3) The NPS fails to predict loyalty behavior; (4) The NPS performs worse than satisfaction and liking questions; (5) The NPS performs worse than other scales; (6) The scoring inflates the margin of error: by converting an 11-point scale into a 2-point scale of detractors and promoters, information is lost.Throwing out the "passive" clients means that the organization misses the opportunity to work on those customers that are easiest to move upward to promoters.
Despite enduring managerial popularity, academics remain skeptical of NPS, citing methodological issues and ongoing concerns with NPS measurement.In particular, Eskildsen and Kristensen (2011) and Kristensen and Eskildsen (2014) believe that NPS is not a reliable indicator of effective customer retention.The ability of the NPS to really measure customer satisfaction and, consequently, loyalty to the brand is increasingly being questioned.In fact, there is no evidence linking the growth/decrease of the index to an equivalent growth/decrease in the business volume of the company.The single question used to compute the NPS does not consider the psychological variables that lead to the purchase and repurchase of a specific product/service.Indeed, the consumers who buy durable goods exhibit different behavior from those who buy consumer goods.There is no focus on the customer's intention to eventually buy the product again in the future, only on his/her propensity to suggest the brand to friends or acquaintances.Mecredy et al. (2018) and Baehre et al. (2022) revisited the use of NPS as a predictor of short-term sales growth through empirical investigations, concluding that the methodological concerns raised by academics are valid.Furthermore, there are considerable differences in the different markets where companies operate.Likewise, the socioeconomic variables used to describe customers are not taken into account.There is also a complete absence of the "do not know" mode in the scale of possible answers, which removes the potential for the respondent to express neutrality.Companies operating in different markets and having to deal with different dynamics cannot readily compare themselves using the NPS index.Similar values of the NPS index for companies operating in different markets could have completely different meanings in terms of affirmation and the acquisition of market share.However, it is not even clear how one can think of comparing the value of the NPS index between companies operating in the same market.If a company has a higher NPS index value, how should this result be interpreted?And if the value of the index for a given company increases over time, does that mean that the market position is being consolidated and consequently that profits will increase?These multiple aspects are not taken into account in the structure of the NPS.A further criticism of comparing the NPS of similar companies operating in different countries is that some TQM 35,9 countries are more accustomed to using the full scale of marks from 0 to 10 following habits formed at school.Nations such as Great Britain or the Scandinavian countries label scores of 9 and 10 as "excellent" due to their cultural heritage; Italian high schools, on the other hand, follow the standard that a mark of 8 out of 10 is considered an "excellent" grade.Kristensen and Eskildsen (2014) suggested a different distribution of respondents, such that scores from 0 to 4 are attributed to detractors, scores from 5 to 7 are passives and scores from 8 to 10 indicate promoters.This type of clustering will distribute the interviewees in more homogeneous groups.Note that the most well-known and accredited customer satisfaction measurement indicators, such as the EPSI rating or ACSI, use a 10-point score scale, which is considered more efficient.
Given the structure of the NPS index, it is not even clear how the scores should be interpreted.If developed for all possible combinations of percentages in the three clusters, a perfectly symmetrical triangular structure is obtained.This suggests that identical NPS values can be obtained with profoundly different compositions of the percentages involved in the calculation.How should this result be interpreted?Can identical index values indicate the same business performance even if the results derive from different percentage compositions of detractors and promoters?
In their critical review of the NPS, Fisher and Kordupleski (2019) highlight five further problems with as the index: (1) The NPS provides no data on how a company can improve; (2) The NPS focuses only on keeping customers, not on winning new customers; (3) There is no such thing as a "passive" customer; (4) The NPS provides no competitive data; (5) The NPS is internally focused, not externally focused.
They also provide recommendations on how to avoid these problems.
Despite these criticisms, the NPS remains popular because it is well marketed, easy to understand and its model makes intuitive sense: every organization wants more promoters than detractors.
In this paper, we describe the application of statistical tools to the NPS to establish whether identical values of the NPS index produced by different compositions of customers may or may not be considered similar under statistical hypotheses.
The remainder of this paper is organized as follows.Section 2 recalls some statistical aspects of the NPS already reported in the literature.In particular, we report the proposal presented by Rocks (2016) regarding the confidence interval of the index and the research of Capecchi and Piccolo (2017) on the distribution of NPS.In Section 3, we present a critical comparison of similar NPSs.Specifically, in Subsection 3.1, we analyze equal scores at two different points in time, but referring to indices generated with different compositions.Section 4 presents the methodology we use to introduce our proposal.The marginal homogeneity test described in Subsection 4.1 provides a statistical validation of equal NPSs at two different points in time.Subsection 4.2 goes further and suggests that the cumulative odds ratio should be adopted to establish the proportionality between the odds of different outcomes.Results are reported in Section 5. Finally, Section 6 presents the discussion, the implications and further research.

Statistical aspects of the NPS
The simplicity of the NPS means that it is widely used, despite being heavily criticized.However, only a few recent papers faced inferential procedures with regards to NPS.In particular, Rocks (2016) and Capecchi and Piccolo (2017).

Validation of NPS's critical aspects
Rocks (2016) describes the properties of the NPS starting from the definition of its distribution law.The goal is to compare different confidence intervals.The main difficulty relates to the definition of the NPS distribution, as many trinomial laws can be suitably adapted.To calculate the variance of the NPS, σ NPS , it seems appropriate to use the difference of two proportions (Gold, 1963;Goodman, 1965), giving the following formula: where p pro and p det represent the proportions of promoters and detractors, respectively.Different approaches for determining the confidence interval for the NPS were presented by Rocks.Among these, Wald's confidence interval, which is based on Laplace's proposal (de Laplace, 1812), stands out: where z α=2 is the standard normal distribution quantile and n is the sample size.An alternative proposal is the adjusted Wald interval introduced by Agresti and Coull (1998) and subsequently modified by Agresti and Min (2005) for matched pairs in a 2 3 2 contingency table: σ NPS and b n are the adjusted estimates.Analogous to the Wald method is the Goodman method (Goodman, 1964).Bonett and Price (2012) presented an adjusted Wald interval for matched pairs and 2 3 2 tables, which introduces a system of weights for those cells involved in the calculation.Alternatively, it is possible to define a confidence interval for the NPS by implementing iterative procedures based on various score tests, such as those based on the original proposal of Wilson (1927), the interactive score method introduced by Tango (1998), which is itself a modification of the test introduced by Agresti and Min (2005) or the May and Johnson (1997) score method.In conclusion, Rocks advises against the use of the Wald and Goodman methods, as they perform poorly.On the contrary, he states that the adjusted Wald method and the iterative score method perform very well, guaranteeing good levels of coverage.
In the paper of Capecchi and Piccolo (2017), the authors search for the distribution of NPS based on a convenient structure of the response patterns.They assume a parametric mixture for the responses and verify the behavior of NPS over the parameter space.From a statistical point of view, they consider NPS index as an estimate of the mean value of a discrete random variable whose probabilities are generated by a distribution expressing the graduated opinions of a sample of respondents on an ordinal scale.In particular, they assume that ordinal responses of the customer judgments/opinions are generated by a CUB (Combination of discrete Uniform and shifted Binomial) model as in Piccolo ( 2003) and D' Elia and Piccolo (2005).They show that infinitely many CUB models refer to the same NPS and that the uncertainty always present in human decisions as well as the heterogeneity of the respondents may largely affect the NPS value.
Rocks and Capecchi and Piccolo papers represent a significant proposal in which some statistical properties of the NPS index are investigated.This certainly leads to a more accurate description of the index itself but does not overcome all of its criticisms.Our proposal stands alongside that of the cited authors with the aim of investigating, through appropriate statistical procedures, the composition of the index so that companies can implement the appropriate corrective/improvement actions.TQM 35,9

Critical comparison of similar NPSs
As we have already stated, the same NPSs can represent (very) different situations.
Figure 3 displays all the possible values assumed by the NPS index (from À1 to þ1) corresponding to all possible numbers of detractors (from no detractors to all respondents being detractors).As we can see from Figure 3, different compositions of the score can give the same result.For example, an NPS of 0.3 can be achieved with detractor percentages from 0% to slightly less than 40%.This raises the question of whether it is reasonable to compare companies with the same NPS while ignoring the percentages of promoters and detractors.More specifically, what conclusions can we draw from the comparison of two (possibly similar) scores for the same company at different points in time, without considering the evolution of these percentages?
This section focuses on comparing two NPSs for the same company at two different points in time, t 1 and t 2 .

Composition of the NPS
We consider a company with ratings from 100 customers and their NPS in 2 consecutive years, Year 1 and Year 2 .Consider the situations described in Tables 1 and 2.
Note that the company described in Table 1 has the same NPS in both years in this case: The composition of detractors, promoters and passive customers is also the same in both years.Each customer confirms their opinions over time.

Validation of NPS's critical aspects
The company described in Table 2 has the same NPS score in both years in this case: However, the composition is quite different in each year.The customers that are detractors in Year 1 become passive in Year 2 , which is good for the company, but the 28% of customers who are promoters in Year 1 shift to passive customers in Year 2 , which is not so good for the company.Obviously, the situation in Table 2 is much more realistic than that represented by Table 1.
These two examples highlight the indiscriminate use of the index without evaluating its composition.However, one may reply that the absolute number of promoters and detractors in the two years appears quite different, although it is the same 100 customers.This means that there is some signal that something has changed over time.
In particular, let us consider Tables 3 and 4. Note that Table 3 has the same marginals as in Table 2 and equal NPSs in Year 1 and Year 2 (0.5).However, only about 9% of the customers who are detractors in Year 1 confirm their opinion in Year 2 ; the remainder are split between passive customers and promoters in Year 2 .This is a very good result for the company!Looking only at the NPS value, this result is not detected and, in particular, considering just the NPS values across the years does not highlight the evolution of customers in Tables 2 and 3.
Table 4 presents a situation in which the company has two similar NPSs in the two years (0.52 in Year 1 and 0.5 in Year 2 ).Note that 100% of the detractors in the first year move and become promoters in the next year.The 31% of promoters in Year 2 change their evaluation in Year 2 .Again, these changes in customers' opinions of the company do not emerge from a simple observation of the NPS.
The situations highlighted in Tables 3 and 4 are clearly borderline case studies.In reality, it will be quite difficult to find a detractor of a company that becomes a promoter from one year to the next.The objective of these considerations is a mathematical study of the NPS index, and these situations illustrate the limitations of the indicator itself.
In this subsection, we have highlighted the different compositions of detractors, passive customers and promoters that can produce similar NPSs from a descriptive point of view.In the next section, we consider the situation from an inferential perspective.

Methodology
A statistical validation of equal NPSs at two different points in time can be achieved by looking at the marginal data in the tables presented in the previous subsection.Equal NPSs with a "similar" composition of components should have a two-way contingency table that satisfies the marginal homogeneity hypothesis.

Marginal homogeneity
Let ðNPS Year 1 ; NPS Year 2 Þ denote the two responses of a randomly selected matched set.With three response categories, a contingency table with 3 3 3 cells summarizes the possible outcomes.
Let j ¼ ðj 1 ; j 2 Þ denote the cell containing NPS Year t ¼ j t ; t ¼ 1; 2: Let π j ¼ PðNPS Year t ¼ j t ; t ¼ 1; 2Þ be the joint distribution of ðNPS Year 1 ; NPS Year 2 Þ.Then, where the subscript j is in position t and the subscript þ denotes the sum over that index.
Note that fPðNPS Yeart ¼ jÞ; j ¼ 1; 2; 3g is the marginal distribution for NPS Yeart [3].This two-way table satisfies marginal homogeneity if Tests of marginal homogeneity have been studied for binary contingency tables and extended to larger tables (Agresti, 2013, Ch. 11)

Validation of NPS's critical aspects
In our case of ordinal variables, we compare the marginals using a cumulative marginal logit model: (1) where x 1 ¼ 0; x 2 ¼ 1 and logit½PðNPS Yeart ≤ jjx t Þ for t ¼ 1; 2 and j ¼ 1; 2; 3 denotes the socalled cumulative logit: Each cumulative logit uses all three response categories.Note that this model simultaneously uses two cumulative logits for NPS Yeart , t ¼ 1; 2: Following Eq. ( 1), each cumulative logit has its own intercept α j .The α j are increasing in j, because PðNPS Yeart ≤ jÞ increases in j and the logit is an increasing function of PðNPS Yeart ≤ jÞ.
Usually, the α j intercepts are not of interest except for computing response probabilities.
The parameter estimates yield estimated logits and hence estimates of PðNPS Yeart ≤ jjx t Þ or PðNPS Year t > jjx t Þ.It is worthwhile to note that this model gives stochastically ordered marginal distributions, with β > 0 indicating that NPS Year 1 tends to be higher than NPS Year 2 .
Marginal homogeneity corresponds to β 5 0. The further role of the β parameter will be highlighted in the next subsection.Maximum likelihood (ML) fitting of this model is not straightforward (model fitting treats ðNPS Year 1 ; NPS Year 2 Þ as dependent, Agresti 2013, Ch. 12), but can it be done using the R statistical software (R Core Team, 2019) through the specialized mph.fit function developed by Joseph Lang at the University of Iowa, which is contained in the hmmm package (Colombi et al., 2014).The ML marginal fitting method makes no assumptions about the model that describes the joint distribution of π j .Thus, when the model holds, the ML estimate of parameters is consistent regardless of the dependence structure for that distribution.
The marginal homogeneity model (H 0 : marginal homogeneity, β 5 0; H 1 : H 0 , β ≠ 0) is validated through the likelihood ratio test G 2 , which compares the model under investigation (marginal homogeneity) with the saturated (unconstrained) one.Under the null hypothesis, the test statistic G 2 follows the χ 2 distribution with degrees of freedom, df, equal to the difference between the free parameters in the two models (the saturated model and the tested model).We reject the hypothesis that the selected model provides a good representation of the dataset when the p-value is less than some critical value (usually 0.05).

Cumulative odds ratio
The cumulative marginal logit model assumes a proportional odds structure, which means that it has the same effect β for each logit; indeed, this model satisfies Eq. ( 2): Therefore, the same proportionality constant applies to each logit.Furthermore, TQM 35,9 (3) Note that, in the above formulas, we have omitted the references to x t , t ¼ 1; 2; to simplify the notation.Indeed, from Eq. ( 4), the odds of the outcome NPS Year 2 ≤ j is expðβÞ times the odds of NPS Year 1 ≤ j for j ¼ 1; 2: This is why the cumulative marginal logit model is often called the "proportional odds model" (McCullagh, 1980).Note that an odds ratio of cumulative probabilities, as given by expðβÞ in Eq. ( 3), is called a cumulative odds ratio.
We have already stated that, in the cumulative marginal logit model, the marginal homogeneity corresponds to β ¼ 0. This implies that: which means that the cumulative odds ratio expðβÞ is equal to 1.In cases where the hypothesis of marginal homogeneity is rejected, the cumulative odds ratio becomes an interesting tool for measuring the proportionality between the odds.

Results
Applying the marginal homogeneity model to the tables presented in Subsection 3.1, we obtain the results in Table 5. Obviously, Table 1 represents the marginal homogeneity situation.
Examining the marginals of the three tables considered in Table 5, the decisions according to the marginal homogeneity tests (Reject H 0 , Reject H 0 and Do not reject H 0 , respectively) are quite obvious for all three cases.Note that all three tables have broadly similar NPSs over time.Performing this kind of statistical test brings out details on the composition of the index that are hidden when looking at only a single number.Furthermore, it is worthwhile considering the situation described in Table 6.   2 and 3, and similar compositions of the NPS index components

Validation of NPS's critical aspects
In this case, NPS Year 1 ¼ NPS Year 2 ¼ 0:5 once again.The two indices have apparently similar compositions over time.In fact, 4 of the 22 detractors and 3 of the 72 promoters in Year 1 change their opinions.The marginal homogeneity model applied to this table gives the following results: G 2 5 5.5460 with p-value 5 0.0625.Thus, with the usual benchmark level of significance, we will not reject the marginal homogeneity hypothesis, but this does not happen with higher levels of significance (i.e.10%).This situation highlights that even slight changes in opinion of the detractors/promoters give statistically significant consequences.
As we already mentioned, in cases where the hypothesis of the marginal homogeneity is rejected, the cumulative odds ratio becomes an interesting tool for measuring the proportionality between the odds.Table 7 reports the estimated cumulative odds ratio of the tables for which the hypothesis of marginal homogeneity was rejected.
The interpretation of the estimated cumulative odds ratio comparing marginals is expð b βÞ as highlighted in Eqs. ( 3) and ( 4).This means that in Table 2, the estimated odds of the response "detractor" in Year 2 for a randomly selected subject are e 0:0975 ¼ 1:1 times the estimated odds of the response "detractor" in Year 1 for another randomly selected subject.Additionally, the estimated odds of the response "detractor" or "passive" in Year 2 for a randomly selected subject are 1.1 times the estimated odds of the response "detractor" or "passive" in Year 1 for another randomly selected subject.Considering Table 3, the estimated odds of the response "detractor" in Year 2 for a randomly selected subject are e 0:6437 ¼ 1:9 times the estimated odds of the response "detractor" in Year 1 for another randomly selected subject.The estimated odds of the response "detractor" or "passive" in Year 2 for a randomly selected subject are 1.9 times the estimated odds of the response "detractor" or "passive" in Year 1 for another randomly selected subject.At this point, it is worth comparing the situations represented in Tables 2 and 3.They present the same values of the NPS index in the two years being considered.Analysis of this single number could suggest similar situations.We have already highlighted how the composition of the components of the index differs in the two situations.In particular, in Table 2, the second year shows an improvement in "detractors" and a worsening in "promoters."In Table 3, however, the situation improves considerably from one year to the next.This diversity between the two tables emerges with the marginal homogeneity test, which rejects the hypothesis of homogeneity.Now, the fact that the two tables represent different cases begins to be evident.This evidence becomes even stronger with the use of the cumulative odds ratio.In Table 2, the possibility of having been in the same condition (detractor or passive) from one year to the other varies, but it is much less than in Table 3, where, instead, it almost doubles.
Other situations are worth investigating as well.For example, consider Table 4.As already mentioned, the marginal homogeneity test indicates that Table 4 presents the homogeneous marginals as is evident to the naked eye.In Table 4, therefore, one would expect an estimate of the cumulative odds ratio equal to 1.In this case, instead, it is equal to e 1:6797 ¼ 5:4, a value very far from 1! This apparently unexpected result is actually justified by the particular situation represented by Table 4, where there are so-called "compensations" in the marginal distributions.Therefore, an investigation of the homogeneity of the marginal distributions would not, in this case, have been sufficient to highlight the different compositions of the index in the two periods considered.6. Discussion, implications and further research 6.1 Discussion Many methods in the literature, as cited in the Introduction section, that measure customer satisfaction use statistical techniques to obtain results on which to base business management strategies.For example, Structural Equation Modeling (SEM) is usually the technique for finding the customer satisfaction level and validating the causal relationship between customer satisfaction and its antecedents and consequences.This technique is, therefore, used to validate different types of customer satisfaction indices.The objective of SERVQUAL methodology is usually to develop the best instrument for measuring customer satisfaction and SEM; Factor Analysis or Multiple Regression analysis are usually used for choosing and validating the best service quality constructs among the proposed ones.Furthermore, the MUSA method follows the principles of ordinal regression analysis under constraints.
It should be noted that the literature that has dealt with NPS has mainly focused on highlighting the weaknesses of the indicator.Solutions are suggested to overcome these weaknesses but often no mathematical-statistical models are implemented to verify the validity of the proposed solutions, e.g.changing the scale.Other works provide indications as to how management should behave, e.g.additional surveys (see Subsection 1.1).An innovative methodological proposal is that of Rocks (2016) who, by defining the probabilistic context of the index, determines the construction of confidence intervals around the index value estimate (see Section 2).In addition, Capecchi and Piccolo (2017) search for the distribution of NPS based on a convenient structure of the response patterns.Furthermore, fuzzy set Qualitative Comparative Analysis (fsQCA) has been used to analyze the relationship between customer satisfaction and loyalty measured by the NPS and dependent variables as gastronomy, cleanliness and room comfort and satisfaction expressed by clients in the area of reception in the hotel industry by Baquero (2022).
Our proposal stands as a bridge between the pure management approach and the application of statistical models.The intent of our proposal is to offer a statistical tool known in the literature and easy to use and read in order to facilitate the company management in the correct reading and interpretation of the NPS.We are inspired by Deming's TQM philosophy.His PDCA cycle in our proposal can be interpreted as follows (see Johnson, 2002;Taufik, 2020): (1) Plan: plan the change.Plan consists of setting goals and strategies to achieve specific results.
(3) Check: analyze the results and identify learnings.
(4) Act: take action based on what you learned in the check step.
Our proposal is summarized in Figure 4. Figure 4 can be interpreted as follows.
Plan: the company starts computing NPS t 1 and sets goals and strategy to achieve in the reference period of time.
Do: the company computes NPS t 2 and compares to NPS t 1 .If NPS t 1 ≠ NPS t 2 enter in the check phase and evaluate the future actions.Act: if needed to achieve the business growth goals.If NPS t 1 ¼ NPS t 2 , the same NPS value can actually represent very different situations and then enter in the check phase, first, performing a marginal homogeneity test of H 0 : marginal homogeneity vs H 1 : H 0 and calculate the estimated cumulative odds ratio.
(1) If H 0 is not rejected and the estimated cumulative odds ratio is equal to 1, then we can consider the margins to be homogeneous, i.e.Table 1.In this case,

204
NPS t 1 ¼ NPS t 2 indicates an equal composition between the two indices.This scenario describes the situation in which the company has maintained a stable position over time with regard to the "loyalty" of its clients.There has been neither deterioration nor improvement.Act: in this case, the company, having assessed the degree of dynamism of the market in which it operates, may decide to improve its market position carrying out ad hoc surveys among its customers to find out which aspects to improve or maintain its established position in the market.
(2) If H 0 is not rejected and the estimated cumulative odds ratio is far from 1, then we can consider the marginals to be homogeneous because of compensation.In this case, NPS t 1 ¼ NPS t 2 does not indicate an equal composition between the two indices, i.e.Table 4.To check how the situation evolved between NPS t 1 and NPS t 2 , consider the estimated cumulative odds ratios and judge how the compositions have changed in the considered situations.This represents the most ambiguous case.In fact, the first information given by the statistical survey would lead to conclusions that are the opposite of those when the survey is complete.This is the case that best highlights the criticality of the NPS index.Therefore, it is necessary to have further statistical instruments to confirm (or not) the information apparently provided by the index itself.Act: in this case, the company has to investigate further, by choosing whether to investigate according to a qualitative or quantitative approach, taking advantage of the different methodologies existing in the literature.
(3) If H 0 is rejected and the estimated cumulative odds ratio is far from 1, then we can consider the margins not to be homogeneous.In this case, NPS t 1 ¼ NPS t 2 but the composition of the two indices differs, i.e.Tables 2 and 3. To check how the situation has evolved between NPS t 1 and NPS t 2 , consider the estimated cumulative odds ratios and judge how the compositions have changed in the considered situations.This scenario represents the most extreme theoretical situation in which the company must understand how its position has changed, for better or worse, in order to implement any corrective actions.Act: in this case, the company that wants to improve its position in the market has to investigate further.In particular, it should carry out ad hoc surveys among its customers in order to understand the reasons why customers responded favorably/unfavorably.In addition to the single question used for the construction of the NPS, other questions could be added that aim to clarify the reasons why the customer gave a certain grade/score (Rajasekaran and Dinesh, 2018).In this sense, one could also proceed with the Net Emotional Value (NEV), i.e. try to analyze the customer's experience through the study of his or her emotions, thus creating a greater connection with the company itself (Achmad et al., 2020).Basically, companies that find themselves in this position are necessarily faced with an obvious situation of dissatisfaction on the part of their customers.A valid solution is to choose whether to investigate this loss of consensus on the part of their customers according to a qualitative or quantitative approach, taking advantage of the different methodologies existing in the literature.
The aim of this work has been to draw attention to the indiscriminate use of the NPS index.
In particular, we highlighted how the same NPS value can actually represent very different situations.We have proposed a statistical validation of the use of this index by suggesting Validation of NPS's critical aspects structures that are already known in the literature and that can easily support the analysis of the NPS index.

Theoretical and practical implications
Following the research line traced by the study, various theoretical and practical implications can be derived.
Referring to theory, first this study contributes to the current literature by adopting a statistical approach to determine whether or not identical values of the index can be considered similar based on statistical assumptions, adding novel knowledge in an underresearched topic in the NPS literature.
Furthermore, it has already been pointed out that the management of a company often makes decisions based on subjective threshold values of the NPS index.Our proposal would make it possible to statistically validate the choice of these threshold values in a more objective manner.Our algorithm also allows for temporal comparisons of the index and can thus support PDCA actions to be carried out over time.By implications, the theory allows individuals and organizations to plan and continually improve themselves, their relationships, processes, products and services.
Another insight for scholars of NPS users is that a successful and vigorous implementation of our algorithm improves positively the conscious knowledge of the proper customer loyalty.In particular, note that the technical implementation of our algorithm is feasible with any basic statistical software, e.g. the free R software.
Furthermore, we think our proposal can help a company improve its quality management.We are certainly not able to provide data on what causes dissatisfaction.We can, however, indicate that its NPS index has changed composition from one time instant to the next and, for example, point out to the company that some of its promoters have become passive.Many business firms are channeling more efforts to retain existing customers rather than acquiring new customers since the cost of acquiring new customers is greater than retaining the existing ones.This information will enable the company to activate all the procedures, which it is able to manage, in the Act phase of the PDCA cycle to achieve the goal of improving its next NPS.
In confirmation of what has already been presented, it is worth noting that large companies (e.g.HP and Sky) have already implemented this good practice of combining the "single question" survey of the NPS with a questionnaire investigating the reasons supporting the summary judgment made by the NPS itself.In our opinion, these companies have already incorporated, according to Deming's philosophy, the need to capture customer satisfaction/dissatisfaction reasons.

Limitations and further research
There has been a great deal of debate in the literature on the erroneous and illusory use of the NPS: there is no scientific confirmation of the link between the value of the index and growth in customer loyalty.Some scholars believe that ignoring the large proportion of neutrals is a big mistake.Being passive does not necessarily mean having a neutral stance; in fact, they may be more likely to assimilate with detractors in terms of searching for a better buying experience.There is also no evidence that the value of the NPS is a good predictor of future sales growth.Finally, the NPS is not even reliable in measuring the growth/decline of a company over time (Mecredy et al., 2018;Fisher and Kordupleski, 2019).
This work does not pretend to be exhaustive of all the criticisms that have emerged regarding the NPS.Instead, we have tried to highlight the usefulness of NPS users possessing the basic statistical knowledge that is necessary to be able to use tools that make the index TQM 35,9 Figure 1.PDCA cycle Figure 2. Client categories used in evaluating NPS Figure 3. NPS versus all possible proportions of detractors Figure 4. Decision algorithm

Table 1 .
Equal NPSs in Year 1 and Year 2 , identical composition of the NPS index components

Table 2 .
Equal NPSs in Year 1 and Year 2 , different composition of the NPS index components . Such tests can differentiate between nominal and ordinal variables.

Table 6 .
Equal NPSs in Year 1 and Year 2 , as in Tables