Assessing the overall ﬁ t of composite models estimated by partial least squares path modeling

Purpose – This study aims to examine the role of an overall model ﬁ t assessment in the context of partial least squares path modeling (PLS-PM). In doing so, it will explain when it is important to assess the overall model ﬁ t and provides ways of assessing the ﬁ t of composite models. Moreover, it will resolve major concerns aboutmodel ﬁ t assessment that have been raisedin theliterature on PLS-PM. Design/methodology/approach – This paper explains when and how to assess the ﬁ t of PLS path models. Furthermore, it discusses the concerns raised in the PLS-PM literature about the overall model ﬁ t assessment andprovidesconciseguidelinesonassessingtheoverall ﬁ t of composite models. Findings – This study explains that the model ﬁ t assessment is as important for composite models as it is for common factor models. To assess the overall ﬁ t of composite models, researchers can use a statistical test andseveral ﬁ t indicesknown through structural equationmodeling (SEM)with latent variables. Research limitations/implications – Researchers who use PLS-PM to assess composite models that aim to understand the mechanism of an underlying population and draw statistical inferences should take the conceptof theoverall model ﬁ t seriously. Practical implications – To facilitate the overall ﬁ t assessment of composite models, this study presents a two-step procedure adopted from theliterature onSEMwith latentvariables. Originality/value – This paper clari ﬁ es that the necessity to assess model ﬁ t is not a question of which estimator will be used (PLS-PM, maximum likelihood, etc). but of the purpose of statistical modeling. Whereas, themodel tassessmentisparamountinexplanatorymodeling,itisnotimperativeinpredictivemodeling. In particular, we examine the type I error rate and the statistical power of the bootstrap-based test.


Introduction
Over the past decade, composite models have drawn increasing interest in the context of structural equation modeling (SEM). The composite model is regarded as a viable alternative to the common factor model as a means to operationalize and relate abstract concepts from marketing and other disciplines (Sarstedt et al., 2016;Henseler, 2021). Different from the common factor model, the composite model expresses abstract concepts by emergent variables, i.e. composites of observed variables, instead of latent variables [1]. To interrelate emergent variables, two different models have been suggested. First, a model where emergent variables freely correlate (Schuberth et al., 2018). Second, a model where emergent variables are embedded in a structural model (Dijkstra, 2017).
Arguably, the most widespread estimator for composite models is partial least squares path modeling (PLS-PM; Wold, 1982). Its statistical properties are well studied (Dijkstra, 1985), and its use is appreciated by researchers across various fields, including marketing (Hair et al., 2012). PLS-PM can be used for various types of research (Henseler, 2018), and various enhancements have been developed over the past decade (Khan et al., 2019). Moreover, PLS-PM has been implemented in various statistical software packages, including commercial software such as ADANCO (Henseler and Dijkstra, 2017) or SmartPLS (Ringle et al., 2015) and open-source packages such as cSEM .
In SEM with latent variables, the overall model fit assessment is considered to be a crucial step (Kline, 2015). This assessment investigates whether the specified model is consistent with the data collected by exploiting constraints imposed on the observed variables' model-implied variance-covariance matrix, in line with the maxim that, "[i]f a model is consistent with reality, then the data should be consistent with the model" (Bollen, 1989, p. 68). The overall model fit can be assessed in two nonexclusive ways, namely, tests for the overall model fit and fit indices (Schermelleh-Engel et al., 2003). While the former are based on statistical inference, the latter are usually descriptive and quantify the misfit on a continuous scale.
Currently, the literature on PLS-PM takes divergent stands on the overall fit assessment. While proponents mainly follow the reasoning known from SEM with latent variables (Henseler et al., 2016;Henseler, 2017;Benitez et al., 2020), critics, dating back to Lohmöller (1989), have raised several concerns about the model fit assessment, which circle around the following six arguments: PLS-PM is focused on estimating causal-predictive relationships . Assessing the model fit by means of a distance function is not useful in the context of PLS-PM (Hair et al., 2017(Hair et al., , 2019a. Fit indices based on the common factor model are not appropriate to assess a model estimated by PLS-PM (Lohmöller, 1989, p. 54). Thresholds for fit indices have not been proposed for composite models (Hair et al., 2019b). It is unclear whether the fit should be assessed based on the estimated model or on a model with a saturated structural model (Hair et al., 2019b). Small misspecifications are not reliably detected by the bootstrap-based test for the overall model fit .
Apparently, it is not clear when the overall fit of composite models needs to be assessed, and if so, how it should be assessed, thereby giving rise to confusion. In light of this situation, our paper contributes to the PLS-PM literature in five ways. First, we clarify in which research situations the overall fit assessment of composite models estimated by EJM PLS-PM is paramount. Second, we provide an overview of the available means of assessing the overall fit of composite models estimated by PLS-PM, i.e. a bootstrap-based test and various fit indices. Third, we address the concerns about model fit assessment raised in the PLS-PM literature. Fourth, by means of a Monte Carlo simulation, we demonstrate the finite-sample behavior of the bootstrap-based test for the overall model fit in combination with various fit measures and show that it is able to detect misspecified composite models. Fifth, we provide concise guidelines on how to assess the overall fit of composite models. Overall, we answer the questions of when and how to assess the fit of composite models estimated by PLS-PM and show that the raised concerns are mainly unfounded.

A formal definition of the composite model
The composite model is a model that is consistently estimated by PLS-PM (Dijkstra, 2017) [2]. It comprises several emergent variables, and each emergent variable h j is completely determined by a unique block of K j observed variables, h j ¼ w > j x j , where the vector w j contains the weights of block j and the vector x j contains the K j observed variables of block j. It is assumed that all observed variables are standardized and that each observed variable is connected to only one emergent variable.
In the composite model studied in the context of confirmatory composite analysis (CCA; Schuberth et al., 2018;Schuberth, 2020, Hubona et al., 2021), the emergent variables are typically allowed to freely correlate [3]. Hence, the emergent variables' variance-covariance matrix U is unconstrained. The model-implied variance-covariance matrix of the observed variables R(h), where the vector h comprises the model parameters, can be expressed as a partitioned matrix: The variances and covariances of the observed variables of block j are captured in the intrablock variance-covariance matrix R jj . Typically, all observed variables of one block freely correlate. The covariances between the observed variables of blocks j and l are captured in the inter-block covariance matrix R jl , with j = l. In contrast to the intra-block variancecovariance matrices, the inter-block covariance matrices contain constraints, namely, that the emergent variables carry all information between the blocks of observed variables: where the scalar f jl equals the covariance between the emergent variables h j and h l . The covariances between the emergent variable h j and its respective observed variables x j are captured in the vector k j = R jj w j . The latter are often labeled composite loadings. Additionally, the emergent variables g can be embedded in a structural model (Dijkstra, 2017). Therefore, we distinguish between exogenous emergent variables (g exo ) and endogenous emergent variables (g endo ). The former contains emergent variables that are not explained by other emergent variables in the structural model. Equation (3) provides a formal representation of a linear structural model containing emergent variables: The matrices C and B capture the respective coefficients of the exogenous and endogenous emergent variables. The error terms of each structural equation are captured in the vector f and are assumed to have a mean of zero. For simplicity, it is assumed that they are mutually uncorrelated and uncorrelated with the exogenous emergent variables g exo . Consequently, the variance-covariance matrix of the emergent variables U has the following structure: The identity matrix I is of the same dimension as the coefficient matrix B, and it is assumed that I À B is nonsingular.
The calculation of the model-implied variance-covariance matrix of the observed variables R(h), where the emergent variables are embedded in a structural model, is similar to that in the case in which the emergent variables are allowed to freely correlate, i.e. the variance-covariance structure has the same form as shown in Equation (1). However, the covariance f jl between the emergent variables h j and h l needs to be replaced by the corresponding element of the variance-covariance matrix of the emergent variables as implied by the structural model shown in Equation (4).
3. When to assess the overall fit of composite models The type of research question (Leek and Peng, 2015;Henseler, 2018), and thus the purpose of the research, determines the type of statistical modeling. In general, two types of statistical modeling, which appear under various names, are differentiated (Hand, 2019), namely, explanatory and predictive modeling (Shmueli, 2010). Both types of modeling are concerned with the analysis of data. However, they differ in their purpose and treat data differently.
The purpose of predictive modeling is to provide accurate predictions. The statistical model is used to generate these predictions (Shmueli, 2010). The data at hand are typically split, and one part of the data is used to train the model, while the other part is used to validate the model. Often, the statistical model is treated as a black box (Breiman, 2001), harking back to the remark that, "[i]f a model is created to make this prediction, it should not be constrained by the requirement of interpretability" (Kuhn and Johnson, 2013, p. 4). Consequently, a predictive model does not need to be based on a proper theory but can be driven by data. In contrast, explanatory modeling aims at understanding the mechanisms and processes underlying the data at hand, the so-called data-generating process or population. Explanatory models are typically based on a researcher's theory and are often "simpler" than predictive models, because this facilitates their interpretation (James et al., 2013, Chapter 2.1.1). The data at hand are used for model estimation and testing (causal) hypotheses.
The type of statistical modeling determines how a model is validated. In predictive modeling, model validation focuses on the predictive power of a model, i.e. its ability to accurately predict new/unknown data. In contrast, in explanatory modeling, model validation investigates whether the specified model adequately describes the processes and mechanisms in question (Shmueli, 2010). Therefore, the adequacy of the specified model is of utmost importance because a wrongly specified model likely leads to wrong conclusions. Although in empirical research the line EJM between predictive and explanatory modeling often may be blurred (e.g. various models from marketing research can be used for predictive purposes, see Leeflang and Wittink, 2000), there are instances where following the rules of explanatory modeling leads to suboptimal solutions in the sense of predictive modeling and vice versa (Ebbes et al., 2011).
SEM is typically regarded as an approach to explanatory modeling (Bollen, 1989;Kline, 2015). Structural equation models are specified in accordance with a theory and estimated to test this theory (Hayduk et al., 2007). To estimate the model parameters, consistent estimators are preferred because a researcher wants to be sure that for a large sample size and a correctly specified model, the estimates are close to the population values with a high probability. This fact is also recognized in the marketing literature: "If the model has a descriptive or normative purpose, consistent estimates are key" (Ebbes et al., 2011(Ebbes et al., , p. 1121. This also highlights the problem of PLS-PM's inconsistency for reflective and causalformative measurement models comprising latent variables (Dijkstra, 1985).
A crucial step of model validation in SEM is the overall model fit assessment, which means investigating how well the model explains the data (Kline, 2015, p. 120). If the model is an acceptable representation of reality, the data should be consistent with the model and thus with a researcher's theory from which the model is derived. To investigate the overall fit of a model, it is typically examined whether the constraints imposed by the model, which are reflected in the model-implied variance-covariance matrix of the observed variables, are consistent with the collected data, i.e. the sample variance-covariance matrix. It is emphasized that, "if SEM is used, then model fit testing and assessment is paramount, indeed crucial, and cannot be fudged for the sake of 'convenience' or simple intellectual laziness on the part of the investigator" (Barrett, 2007, p. 823). Against this background, the importance of model fit assessment does not depend on a particular estimator. However, different estimators allow for different ways of assessing an estimated model. The composite model estimated by PLS-PM serves the same purpose as the model known from SEM with latent variables, that is, it represents a researcher's theory. However, they differ in how abstract concepts are represented. While in latent variable models, abstract concepts are represented by latent variables, in composite models, abstract concepts are represented by emergent variables. If composite models are applied in the realm of explanatory modeling, assessing their overall model fit is of the same importance as in SEM with latent variables because it provides an important opportunity to empirically validate a researcher's theory.
While the concept of model fit is well elaborated for structural models containing latent variables that are estimated by maximum likelihood (ML) or related estimators (Hu and Bentler, 1999;Hayduk, 2014), in the context of composite models estimated by PLS-PM, the literature studying the overall model fit assessment is scarce. Hence, in the following section, we adopt methods of the overall model fit assessment from the literature on SEM with latent variables and explain how they can be used to assess the overall fit of composite models estimated by PLS-PM.
4. Ways to assess the overall fit of composite models In the literature on SEM with latent variables, various methods of assessing the overall model fit have been proposed. They can be broadly categorized into statistical tests and fit indices. Arguably, the most famous test for the overall model fit is the x 2 test (Jöreskog, 1969). It is based on the asymptotic properties of the fitting function that is minimized by the ML estimator. Since in the context of PLS-PM, no such parametric test has been derived, a nonparametric bootstrap-based alternative was proposed (Dijkstra and Henseler, 2015a;Dijkstra, 2017). Typically, statistical tests for the overall model fit assess the exact model fit, i.e. the null hypothesis that the specified model is able to exactly reproduce all the variances and covariances among the observed variables in the population.
Although the statistical testing framework is theoretically appealing, testing the exact overall model fit has been criticized as highly unrealistic. The basis of this concern is rooted in Box's (1976) famous remark that "all models are wrong," which implies that the null hypothesis of a perfect fit is always wrong and whether the null hypothesis is rejected is only a matter of sample size. Following this reasoning, the exact model fit is typically not of actual interest to researchers who study a certain phenomenon by means of a model, which is a deliberate approximation of reality (Bentler and Bonett, 1980;Steiger and Lind, 1980). Against this background, researchers in the early 1980s started popularizing fit indices as an alternative and supplement to the exact model fit testing (Bentler and Bonett, 1980;Jöreskog and Sörbom, 1982). These indices can be roughly categorized into absolute and relative fit indices (McDonald and Ho, 2002). While absolute fit indices measure the correspondence between the specified model and the data along a continuum to gauge how well the model fits (Mulaik et al., 1989), relative fit indices compare the specified model to a reference model to assess the relative increase in model fit (Bentler, 1990). Consequently, fit assessment through fit indices becomes an assessment of approximate and comparative fit and is of a descriptive, instead of an inferential, nature.
The construction of some of the fit indices, such as the root mean square error of approximation (RMSEA; Steiger and Lind, 1980) and the non-normed fit index (NNFI; Bentler and Bonett, 1980), are directly related to the asymptotic distribution of the x 2 test statistic, which is derived from the ML estimator. Since such a test statistic with analogous properties has not been derived for PLS-PM, these fit indices are not considered in the following of the paper; instead, we focus on fit indices not tied to a specific estimator, i.e. the root mean square residual (RMR; Jöreskog and Sörbom, 1982), the standardized root mean square residual (SRMR; Bentler, 1995), the normed fit index (NFI; Bentler and Bonett, 1980) and the goodness-of-fit index (GFI; Jöreskog and Sörbom, 1993) and show that they are suitable for assessing composite models.

Bootstrap-based test for the exact overall model fit
A bootstrap-based test was suggested in the context of composite models to assess the null hypothesis of exact model fit, H 0 : R(h) = R (Dijkstra, 2017). To draw conclusions about the null hypothesis, the discrepancy between the sample variance-covariance matrix S and the estimated model-implied counterpart Rĥ À Á of the observed variables is considered. To measure the discrepancy between these two matrices, several fitting functions have been proposed, such as the fitting function of the ML estimator (F ML ; Jöreskog, 1970b). Moreover, in the context of PLS-PM and composite models, the geodesic distance (d G ), the squared Euclidean distance and the SRMR have been proposed (Dijkstra, 2017;Schuberth et al., 2018). All these fitting functions have one factor in common, which is that they are equal to zero if the model perfectly fits the data and larger than zero otherwise. However, as suggested by Bollen and Stine (1992), other fit measures such as the NFI can also be used in combination with the bootstrap-based test.
To obtain the reference distribution of a distance function under the null hypothesis, the bootstrap-based test relies on the transformed data set: EJM where the matrix X contains the original data set, and S and Rĥ À Á are the sample variancecovariance matrix and the estimated variance-covariance matrix implied by the composite model, respectively. The transformation of the original data setX is necessary to mimic a situation where the null hypothesis is true, i.e. the model perfectly fits the data at hand.
For monotonically increasing fit measures such as the previously presented fitting functions, the null hypothesis is rejected for a given significance level a if the value of the fit measure based on the original sample exceeds the (1 À a)% quantile of the reference distribution. In contrast, for monotonically decreasing fit measures, the null hypothesis is rejected if the value of the fit measure based on the original sample is below the a% quantile of the reference distribution. In such situations, a researcher has found empirical evidence against the specified model and, following Jöreskog (1969), can conclude that more information can be extracted from the data than is captured by the specified model.

The (standardized) root mean square residual
The RMR is an absolute fit index proposed by Jöreskog and Sörbom (1982). The residuals are given as the elements of the matrix S À Rĥ À Á . Consequently, the RMR shows the root mean square deviation of the sample variance-covariance matrix S from its estimated model-implied counterpart Rĥ À Á : where s ij and sĥ À Á ij are the elements from the i-th row and the j-th column of the sample variance-covariance matrix and the estimated model-implied variance-covariance matrix, respectively. A disadvantage of the RMR is that its values depend on not only the misfit of the model but also the size of the (co-)variances of the observed variables. Consequently, interpreting the values without taking the scales of the observed variables into account is hardly possible.
To overcome this problem, the SRMR was introduced (Bentler, 1995), which scales the residuals by the standard deviations of the respective observed variables: Consequently, the SRMR can be roughly interpreted as the average of the absolute value of residual correlations (Pavlov et al., 2021). Since in PLS-PM the observed variables are typically standardized before the analysis, the SRMR equals the RMR. The RMR and the SRMR are conceptually meaningful for the assessment of composite models. However, the variances and covariances implied by the composite model must be applied. In this case, both the RMR and the SRMR show desirable properties for composite models. For perfectly fitting composite models, i.e. Rĥ

The normed fit index
The NFI is a relative fit index that was originally proposed by Bentler and Bonett (1980). It measures the increase in fit relative to the fit of a null model. Although in general various null models are conceivable, in this paper, we focus on the independence model as the null model, which assumes that all observed variables are uncorrelated, i.e. that the modelimplied variance-covariance matrix of the observed variables equals the diagonal matrix (Bentler and Bonett, 1980, p. 596). Formally, the NFI is defined as follows (Bentler, 1990): where F 0 and F p are the values of the fitting function for the proposed model and the null model, respectively. The principle of the NFI can be directly applied to composite models. The NFI represents the improvement in fit of the specified model against the null model as a proportion of the null model, i.e. the relative fit. If the specified composite model fits the data perfectly, i.e. Rĥ À Á ¼ S, the distance function F p is equal to zero, and the NFI equals one. In contrast, if the specified model shows the same fit as the null model, the NFI takes a value of 0. The null model typically produces a worse fit than that of the originally specified model because it contains more parameter constraints. Therefore, the NFI ranges from zero to one.
Originally, the ML fitting function was proposed to measure the discrepancy between the sample and the model-implied variance-covariance matrix of the observed variables. In fact, the use of any fitting function that equals zero in the case of perfect fit and is monotonously increasing for increasing misfit is conceivable. This is the case for the discrepancy measures proposed in the context of PLS-PM, i.e. the geodesic distance, the squared Euclidean distance and the SRMR.

Goodness-of-fit index
The GFI is also a relative fit index (Jöreskog and Sörbom, 1993). It appears to be inspired by the coefficient of determination known from regression analysis and "measures the relative amount of variances and covariances in the empirical covariance matrix S that is predicted by the model-implied covariance matrix Rĥ -Engel et al., 2003, p. 42). The exact definition of the GFI depends on the fitting function used (Mulaik et al., 1989). Only recently has the GFI been proposed in the context of composite models , when it was defined by means of the unweighted least squares fitting function: where tr denotes the trace operator, and Rĥ À Á and S indicate the estimated variancecovariance matrix implied by the composite model and the empirical counterpart, respectively. Consequently, the GFI equals 1 if the composite model perfectly fits the data set, i.e. when Rĥ À Á ¼ S, and values below 1 indicate a misfit.
EJM 5. Concerns about the overall model fit assessment in the context of PLS-PM In the context of PLS-PM, several concerns regarding the overall model fit assessment have been raised. The following subsections discuss these concerns and provide a conclusion.
5.1 Concern 1: PLS-PM is focused on estimating causal-predictive relationships; hence, model fit assessment is of little value The literature argues that PLS-PM was developed as an approach to causal-predictive modeling (Wold, 1982), and consequently, model fit assessment is of little value . Unfortunately, neither Wold (1982), the founder of PLS-PM, nor the literature that refers to his work provides a clear definition of causal-predictive modeling. Hence, it can have different meanings. First, and following Hair et al. (2019a) and recent literature that aims at demystifying the role of causal-predictive modeling (Chin et al., 2020), causal-predictive modeling could be a middle way between explanatory and predictive modeling that strives to achieve the goals of both explanatory and predictive modeling. Although this idea is striking, it can hardly be achieved, as "the 'wrong' model can sometimes predict better than the correct one" (Shmueli, 2010, p. 293), and explanatory power does not imply predictive power (Forster and Sober, 1994). Against this background, it is not clear why model fit assessment should be disregarded following this understanding of causal-predictive modeling. Second, causal-predictive modeling can mean that researchers make use of explanatory models to make predictions, i.e. model-based predictions. Compared to approaches known from predictive modeling, such as artificial neural networks (Haykin, 2009), this approach has the advantage of knowing how the predictions were made because explanatory models are usually "simple" to ensure their interpretability. On the other hand, it is likely that this approach is inferior to predictive models which are not tied to an explanatory model. Since this approach is based on an explanatory model, it is not clear why researchers should not rely on common principles of explanatory modeling, such as the overall model fit assessment in the context of SEM, to discard wrongly specified models. Similarly, Lohmöller (1989, p. 73) notes that "the predictive purpose should not jeopardize a structural-causal interpretation of the relation." It is well known that correctly specified models may exhibit high out-of-sample predictive accuracy, but reversing the argument is a logical fallacy (Saylors and Trafimow, 2020). In fact, several studies have shown that researchers relying on the measurement evaluation steps of PLS-SEM, which replace the overall model fit assessment with predictive measures, miss an important opportunity to detect wrongly specified models (McIntosh et al., 2014;Schuberth, 2020). Hence, replacing the overall model fit assessment with predictive measures is not recommended for researchers conducting explanatory modeling, regardless of whether predictions are subsequently made.
Conclusion: Since causal-predictive modeling is not clearly defined in the PLS-PM literature, researchers should not use it as a justification to omit the step of the overall model fit assessment when they are working (at least partially) in the realm of explanatory modeling, i.e. testing theories and drawing statistical inferences.
5.2 Concern 2: Assessing the model fit by means of distance measures makes no sense in the context of PLS-PM because PLS-PM does not minimize these distances The PLS-PM literature is concerned about the overall model fit assessment by means of distance functions that measure the discrepancy between the estimated model-implied and the sample variance-covariance matrix of the observed variables (Hair et al., 2019b). Similar concerns have been raised about the bootstrap-based test for the overall model fit, which is based on a distance function and "should be considered with extreme caution" (Hair et al., 2019a, p. 31). These concerns are rooted in the fact that PLS-PM does not minimize such a distance function, in contrast to the ML estimator, to obtain the parameter estimates.
In fact, PLS-PM does not minimize a distance function to obtain the parameter estimates but iteratively estimates several regressions by ordinary least squares. However, as shown by Dijkstra (2017), PLS-PM produces consistent estimates for the composite model like the ML estimator does for common factor models (Jöreskog, 1970b). Moreover, both the ML estimator and PLS-PM are Fisher consistent for common factor and composite models, respectively, and are asymptotically normal (Dijkstra, 2010). Consequently, PLS-PM shows similar statistical properties for composite models as the ML estimator shows for common factor models, although they obtain their estimates differently.
To assess the overall model fit by means of a distance function, it is reasonable to assume a consistent estimator because it produces a consistently estimated model-implied variancecovariance matrix. Otherwise, a distance function would indicate a model misfit even when the sample size converges to infinity, which is of course not desirable. The way in which the estimates are produced plays only a minor role as long as they are consistent. In contrast, the specified model is of much greater importance because an estimator loses its statistical properties, such as consistency, if applied to the wrong model. Hence, it is not clear why model fit assessment by means of a distance function should only function for models that have been estimated by an estimator that minimizes that distance function. The SEM literature has already provided examples of model fit assessment by means of a distance function in cases where an estimator was used that does not minimize such a distance function (Devlieger et al., 2019). Similarly, the SRMR, which can be regarded as a type of distance function, is often considered to assess common factor models that have been estimated by ML; the ML estimator does not minimize the SRMR. This provides additional support against the claim that the overall model fit assessment by means of a distance function makes no sense if the estimator does not minimize this distance function.
In general, quantifying the misfit between the estimated model-implied and sample variance-covariance matrix can be done by any function that accepts these two matrices as input. However, for interpretational purposes, it is desirable that these functions have some particular properties. First, they should be equal to zero if the two matrices are identical, i.e. zero should indicate a perfect fit. Second, they should monotonically increase with an increasing difference between the two matrices. Meeting these requirements, a larger value of the distance function indicates a larger misfit of the model. It is noted that the ML fitting function, the SRMR, the squared Euclidean distance and the geodesic distance function meet these requirements. However, at this stage, the threshold values up to which the discrepancy in the model fit is regarded as acceptable remain unclear (see Subsection 5.4).
To assess the exact model fit via statistical significance testing, one needs to have the (asymptotic) distribution of the distance function under the null hypothesis, i.e. that the modelimplied variance-covariance matrix based on the population parameters equals the population variance-covariance matrix of the observed variables H 0 : Rĥ À Á ¼ R . This distribution depends on several aspects, such as the distance function used and the distributions of the estimated model-implied and the sample variance-covariance matrix. For example, it is well known that the number of observations minus one times the ML fitting function based on the ML estimates given multivariate normally distributed observed variables asymptotically follows a x 2 distribution under the null hypothesis of exact fit (Jöreskog, 1970a). In contrast, for distance functions based on PLS-PM estimates, such a distribution is generally not known.
To overcome the distributional assumptions, a bootstrap-based test was developed that can be used to assess the null hypothesis of exact fit (Beran and Srivastava, 1985, and EJM Section 4.1). Although this test was first proposed to assess structural equation models containing latent variables (Bollen and Stine, 1992), it can be applied in the same way to assess structural models containing emergent variables (Dijkstra, 2017). It does not require an estimator that minimizes a specific distance but rather a consistently estimated modelimplied variance-covariance matrix (Beran and Srivastava, 1985) given by an estimator that produces consistent parameter estimates [4]. This is the case for PLS-PM using Mode B if applied to estimate composite models (Dijkstra, 2017).
Conclusion: Distance functions and the bootstrap-based test can be used to assess the overall fit of composite models, even though PLS-PM does not minimize such a distance function.

Concern 3:
Fit indices that are based on the common factor model are not appropriate to assess a model estimated by PLS-PM The PLS-PM literature became concerned quite early about the use of fit indices based on a common factor model to assess models estimated by PLS-PM (Lohmöller, 1989, p. 54).
It is generally not appropriate to estimate common factor models by PLS-PM because it produces inconsistent estimates for this type of model (Dijkstra, 1985). Consequently, evaluating the fit of a common factor model estimated by PLS-PM is not recommended because even for a correctly specified model and a sample that converges to infinity, fit indices would indicate a misfit. Researchers who want to apply PLS-PM to estimate common factor models should instead use consistent partial least squares and its enhancements (Dijkstra and Henseler, 2015a,b;Rademaker et al., 2019), which provide consistent estimates for common factor models.
Typically, fit indices are based on the model-implied variance-covariance matrix, which captures the constraints imposed by the underlying model. As shown in Section 2, not only the common factor model but also the composite model impose such constraints. These constraints can be exploited in fit indices to assess the overall fit of composite models. Obviously, it is important to apply the variance-covariance matrix implied by the composite model (see also Section 4).
Conclusion: The fit of the composite model can be assessed by fit indices proposed in the context of SEM with latent variables if the variance-covariance matrix implied by the common factor model is replaced by the one implied by the composite model.

Concern 4: Thresholds for fit indices have not been proposed for composite models
The PLS-PM literature reveals concerns about the fact that no threshold values for fit indices have been proposed for composite models (Hair et al., 2019a). As a consequence, it is difficult for researchers applying PLS-PM to judge the absolute and relative fit of their models.
The SEM literature has suggested various threshold values for fit indices that can be applied to judge common factor models (Hu and Bentler, 1999), while in the context of composite models, only a single study proposes such thresholds . Although we think that fit indices are helpful to quantify the degree of model misfit, we are skeptical about comparing the value of a fit index to a threshold value derived from simulation studies to decide whether the model fit is acceptable. As highlighted in the SEM literature, this approach is problematic in several ways: First, it is very difficult, if not even impossible, to generalize such thresholds beyond the simulation design because the distribution of fit indices is influenced by factors other than the degree of misspecification that fit indices attempt to quantify (Yuan, 2005). Consequently, deriving and proposing threshold values is of little benefit for applied researchers, whose research setting likely differs from the design of the simulation study. Second, deriving threshold values through a simulation is based on a flawed logic because the degree of misfit that is still regarded as acceptable is determined by the simulation designer in advance (Marsh et al., 2004). Hence, it falls to the subjective judgment of the simulation designer to determine which model fits are acceptable or unacceptable. Third, deriving threshold values for fit indices under the hypothesis of exact fit contradicts the logic underlying absolute fit indices. Although SEM literature began to embed fit indices in a testing framework (Bollen and Stine, 1992), absolute fit indices were originally introduced to overcome the issue of exact fit. Hence, determining threshold values as quantiles of the distribution of a fit index under perfect fit contradicts the very concept of approximate fit. In fact, it was shown in the context of SEM with latent variables that the conventional x 2 test outperforms the index-plus-threshold-value decision strategy in distinguishing correctly from incorrectly specified models (Marsh et al., 2004).
Conclusion: The use of fit indices is a controversial topic in the literature on SEM with latent variables. The concerns can generally be transferred to the composite model. While opponents call for abandoning the use of fit indices (Barrett, 2007), there are also more optimistic voices that regard fit indices as useful tools to assess the overall model fit. For example, fit indices can be beneficial in situations where the sample size is large and the test for exact fit rejects the null hypothesis, although it is only trivially false (Bentler, 2007). Hence, we take a more liberal stand and recommend reporting fit indices along with the results of the test for exact model fit, because they can provide additional information. However, we recommend against the common practice of comparing fit indices to threshold values derived by simulation studies to judge whether the fit of a composite model is acceptable because this approach suffers from logical inconsistency (Marsh et al., 2004).

Concern 5: PLS-PM is used in case of small sample sizes for which small misspecification is not reliably detected by the bootstrap-based test
The literature argues that PLS-PM is often used in case of sample sizes for which the bootstrap-based test shows only low statistical power, i.e. misspecification often remains undetected . Hence, its use is of only little value in the context of PLS-PM.
It is well known that the power of a statistical test decreases with decreasing sample size as the sampling uncertainty increases (Cohen, 1988, Chapter 1). Hence, this behavior is not an idiosyncrasy of the bootstrap-based test but applies to all statistical significance tests.
Small sample sizes are particularly concerning in the context of explanatory modeling, and the importance of sufficiently large sample sizes has already been highlighted in the context of SEM (Kline, 2015) and marketing research (Sawyer and Ball, 1981). Hence, researchers using PLS-PM who are working in the realm of explanatory modeling are advised to collect a sufficient amount of data before conducting their analysis. As recognized by Rigdon (2016, p. 600), one could say that "PLS path modeling will produce parameter estimates even when [the] sample size is very small, but reviewers and editors can be expected to question the value of those estimates, beyond simple data description." To address this issue, researchers using SEM with latent variables are usually advised to investigate a priori whether the size of the collected sample is sufficiently large to ensure that the statistical test being used has sufficient power, e.g. by conducting Monte Carlo simulations (Wolf et al., 2013). The same approach is also recommended in the context of PLS-PM (Aguirre- Urreta and Rönkkö, 2015). In principle, similar guidelines can be followed to assess the statistical power of the bootstrap-based test of the overall fit of composite models. However, such guidelines have not yet been elaborated.
Conclusion: Like all statistical significance tests, the power of the bootstrap-based test for the overall model fit depends on the sample size. If analysts deem the statistical power too EJM low, they should collect more data. To not test a model is the worst option: It corresponds to a statistical power of zero.
5.6 Concern 6: It is not clear whether fit should be assessed based on the estimated model or a model with a saturated structural model The PLS-PM literature raises concerns about which model should actually be assessed, i.e. the estimated model or the model with a saturated structural model (Hair et al., 2019b).
Recent PLS-PM guidelines for explanatory modeling recommend first assessing the composite model with a saturated structural model and subsequently assessing the originally specified model (Henseler et al., 2016;Benitez et al., 2020); see Section 7 for a more elaborate presentation. The idea of this approach is rooted in the two-step procedure that has been proposed in the context of SEM with latent variables (Anderson and Gerbing, 1988). Among applied researchers, this approach is regarded as beneficial because it allows us to localize the source of misfit, i.e. whether the composition of the emergent variables (first step) or the complete model (second step) is problematic. Ultimately, it is the originally specified model that represents a researcher's theory, and therefore, its fit is what needs to be assessed.
Conclusion: Analysts should assess the fit of their originally specified model. Assessing the fit of a model with a saturated structural model can serve as a useful intermediate step in model fit assessment to localize potential sources of misfit.

Monte Carlo simulation
An important and still open question is the efficacy of the bootstrap-based test for the overall fit and the various fit measures presented, i.e. the geodesic distance, the SRMR, the NFI and the GFI. To answer this question, we conduct a Monte Carlo simulation. Since the comparison of fit indices to derived threshold values has been strongly criticized in the SEM literature (Marsh et al., 2004), we deliberately do not aim at deriving any threshold values for these fit measures but instead investigate their finite-sample performance in combination with the bootstrap-based test for the exact overall model fit. In particular, we examine the type I error rate and the statistical power of the bootstrap-based test.
We consider three scenarios comprising three different population models. Each population model consists of three emergent variables. The three scenarios, including their population models, parameters and variance-covariance matrices, are displayed in Figure 1. Since the bootstrap-based test was recently evaluated with regard to wrongly specified relationships between observed variables and emergent variables (Schuberth et al., 2018), we exclusively focus on misspecifications in the structural model. Therefore, in all population models, only the structural model differs across the scenarios, whereas the weights and the intra-block correlation matrices are kept constant.
Scenario 1 is considered to assess the test's type I error rate. In this scenario, the estimated model matches the population model, and thus, the estimated model is correctly specified. As shown in Figure 1, the SRMR and the geodesic distance are equal to zero if they are calculated for the estimated model based on the population variance-covariance matrix. Similarly, the NFI and GFI show a value of 1. For this scenario, we expect that the bootstrapbased test produces rejection rates close to the predefined significance level.
Scenarios 2 and 3 serve to assess the statistical power of the bootstrap-based test. In Scenario 2, the estimated model does not match the population model, i.e. the estimated model is misspecified. As seen in Figure 1, in the population model of Scenario 2, there is a direct effect between the emergent variables h 1 and h 3 that is omitted in the estimated model. Consequently, the SRMR and the geodesic distance show values larger than 0 for the estimated model based on the population variance-covariance matrix. Similarly, the NFI

EJM
and GFI values are 0.86 and 0.97, respectively, in this scenario. Therefore, we expect that the bootstrap-based test for the overall model fit produces rejection rates above the predefined significance level. In the population model of Scenario 3, the role of the emergent variables h 1 and h 2 in the structural model is switched in comparison to that in the estimated model. Consequently, the estimated model is misspecified. As shown in Figure 1, the SRMR and the geodesic distance are 0.11 and 0.08, respectively, and highlight a misfit of the estimated model based on the population variance-covariance matrix. Similarly, the GFI and NFI values are smaller than 1. Against this background, we expect that the bootstrap-based test produces rejection rates above the predefined significance level.
It is noteworthy that the different fit measures assess the two misspecifications differently. As shown in Figure 1, the SRMR, the NFI and the GFI indicate a worse model fit for the model in Scenario 3, whereas the geodesic distance indicates a worse fit for the model in Scenario 2. We expect that this will also be reflected in the test's rejection rates.
To study the finite-sample behavior of the bootstrap-based test, we vary the sample size from 50, 100, 250, 500, and 1,000 to 2,000 observations per sample. Moreover, we consider two significance levels, namely, 1% and 5%. As is common, for larger sample sizes, we expect an increase in the statistical test's power when the estimated model is indeed misspecified. Similarly, we expect higher statistical power in the case of a higher significance level.
The complete simulation was conducted in the statistical programming environment R (R Core Team, 2020). For each condition, 1,000 samples were drawn from a multivariate normal distribution with mean zero and the variance-covariance matrix of the respective scenario using the mvrnorm function of the MASS package (Venables and Ripley, 2002). To estimate the specified model by PLS-PM, the csem function of the cSEM package was used . For the inner weighting, the factorial weighting scheme was used, and for the estimation of the weights, Mode B was applied. As a stopping criterion, the absolute change in the weights was considered. If the largest absolute difference was smaller than 10 À 5 , the algorithm would stop. Furthermore, the maximum number of iterations was set to 1,000. To run the bootstrap-based test for the overall model fit, the testOMF function of the cSEM package was used. Although we did not face any convergence issues for the initial PLS-PM estimations, we replaced estimations that may not have converged during the bootstrap to ensure that all bootstrap-based tests are based on 499 valid bootstrap runs. Figure 2 illustrates the results of our simulation. For Scenario 1, i.e. the scenario in which the estimated model is correctly specified, the test produces rejection rates slightly below the predefined significance level for small sample sizes, i.e. n # 100, regardless of the assumed significance level and the fit measure used. However, for an increasing sample size, the produced rejection rates converge toward the assumed significance level.
Considering Scenarios 2 and 3, i.e. in the case of model misspecification, the rejection rates are below the recommended threshold of 80% (Cohen, 1988) for very small sample sizes, i.e. n = 50, regardless of the fit measure used. However, in line with our expectations, the produced rejection rates increase for an increasing sample size, and for sample sizes larger than 100 observations, the produced rejection rates were above 80%. Moreover, the rejection rates are higher for the larger significance levels, confirming our expectations. Comparing the performance of the fit measures across Scenarios 2 and 3, the results are largely in line with our expectations. The bootstrap-based test in combination with the geodesic distance rejects the model in Scenario 2 more often than the model in Scenario 3, while the test based on the SRMR and the GFI detect the misspecification of Scenario 3 more reliably. Considering the bootstrap-based test in combination with the NFI, the results are not that clear, i.e. in some conditions, it rejects the model from Scenario 2 more often, while in other conditions, it rejects the model from Scenario 3 more often. We would have expected it to reject the model from Scenario 3 more often.
To conclude, the bootstrap-based test for the overall model fit in combination with the presented fit measures is able to detect model misspecification and produces rejection rates close to the predefined significance levels when the estimated model is correctly specified. However, a sufficient sample size is required to achieve satisfactory statistical power. For all considered fit measures, the bootstrap-based test behaved as expected, i.e. the rejection rates increased for an increasing sample size and/or larger significance levels when the estimated model was misspecified. However, the sensitivity of the studied fit measures for model misspecification differs with regard to the kind of misspecification. The geodesic distance indicates a larger misfit for the model in Scenario 2 than for the model in Scenario 3, while the SRMR and GFI show the opposite. For the NFI, the picture is not that clear. Guidelines on the assessment of the overall fit of composite models Figure 3 depicts our guidelines to assess the overall fit of composite models estimated by PLS-PM. To eventually assess the overall fit of composite models including a structural model, we recommend a two-step procedure known from current guidelines on the use of PLS-PM in confirmatory and explanatory research (Benitez et al., 2020). In the first step, a CCA is conducted, while in the second step, the fit of the originally specified model is assessed. This way of model fit assessment is recommended, as it is a logical necessity that the abstract concepts be properly operationalized before the analysis of the structural model is performed (Anderson and Gerbing, 1982). To illustrate the approach, we focus on a researcher who derived from her theory the model displayed in Figure 4. In the first step, a CCA is conducted, i.e. a model in which the emergent variables freely correlate is estimated, and its overall fit is assessed. Typically, the originally specified model is nested in this model, i.e. the originally specified model contains more restrictions on the parameters than the model with freely correlated emergent variables. Figure 5 displays the model for our researcher from the first step. This model exhibits the same fit as the originally specified model from Figure 4 with a saturated structural model.
An unsatisfactory fit in the first step indicates that the operationalization of the abstract concepts as emergent variables should be reconsidered, as the emergent variables do not convey all the information between the observed variables from two different blocks. Consequently, there are problems in the composition of at least one emergent variable. In contrast, if the fit of the model in the first step is satisfactory, the researcher can continue with the second step.
In the second step, the originally specified model (the model from Figure 4) is estimated and assessed. If the model does not show an acceptable fit, likely the structural model is misspecified. For our fictitious researcher, this can mean that the emergent variable h 2 does not fully meditate the effect of h 1 on h 3 .
The advantage of the two-step approach in comparison to a one-step approach is that a researcher can better localize the source of misfit. In case misfit is detected, regardless in which step, the researcher is advised to inspect its source. For example, a researcher can follow guidelines known from the SEM literature (Kline, 2015) and investigate the residuals. Moreover, in reporting the model fit assessment results, we recommend providing the outcomes of the criteria mentioned in Section 4.

Discussion
The overall model fit assessment in the context of SEM is crucial if SEM is applied for explanatory modeling. Its importance is widely acknowledged in the SEM literature, although not without controversies, e.g. the discourse in the special issue of the journal Personality and Individual Differences (Vernon and Eysenck, 2007). In contrast, for composite models estimated by PLS-PM, it is less clear when and how to assess the overall model fit. To address these issues, we explain that the overall fit assessment of composite models is of utmost importance if composite models are studied in the context of explanatory modeling. Thus, the role of the overall fit assessment is unaffected by the way that abstract concepts are modeled, i.e. as latent or emergent variables. Moreover, we present a bootstrap-based test and four fit indices and show that they are all suitable for assessing the overall fit of composite models estimated by PLS-PM.
The PLS-PM literature has raised several concerns about model fit assessment and its applicability when PLS-PM is used for model estimation (Lohmöller, 1989;Hair et al., 2017Hair et al., , 2019bHair et al., ,a, 2020. The present study discusses these concerns and shows that most of them are unfounded. The current understanding of causal-predictive modeling does not warrant omission of the overall model fit assessment if researchers use the composite model and Guidelines on assessing the overall fit of composite models EJM PLS-PM for theory testing. If PLS-PM is used in the context of explanatory modeling, the overall model fit assessment is a pivotal step. Moreover, the use of the overall model fit criteria that are based on the model-implied variance-covariance matrix is appropriate to assess composite models even though these criteria were first developed for common factor models. However, the variance-covariance matrix implied by the composite model must be applied. Similarly, composite models estimated by PLS-PM can be assessed by means of distance functions even though PLS-PM does not minimize such a function to obtain the parameter estimates. The bootstrap-based test for the overall model fit can be used to assess the exact fit of a composite model. As shown by our simulation, it is able to detect misspecified models estimated by PLS-PM in finite samples and it can also be used in combination with fit indices such as the NFI and the GFI. Although its statistical power might be insufficiently low owing to small sample sizes, this is no reason to abandon the bootstrap-based test. However, it is important that researchers are aware of that risk. Finally, fit indices can quantify the approximate and relative fit of composite models, To support researchers applying PLS-PM in the overall fit assessment of their composite models, the current study provides concise guidelines, i.e. a two-step assessment procedure. While in the first step, a CCA is conducted, in the second step, the originally specified model is assessed. This approach helps researchers better localize the source of misfit. For each step, we recommend reporting the results of the bootstrapbased test and the values of the SRMR, the NFI and the GFI. It is emphasized that researchers who act in the realm of explanatory modeling should take model fit assessment seriously, otherwise they will miss an important opportunity for model validation. Guidelines on PLS-PM for explanatory research that discourage the assessment of model fit resemble cooking recipes that suggest a visual and haptic inspection but at the same time discourage tasting the meal. Our study is limited to the bootstrap-based test for the overall model fit and fit indices that have been proposed to assess the overall fit of composite models. In general, other ways have been suggested to assess composite models, including prediction tests, prediction metrics, tests for rank restrictions on submatrices and the exploitation of differences between different estimators (Dijkstra, 2017;Shmueli et al., 2019;Liengaard et al., 2020). However, none of these should replace the overall model fit assessment in the context of explanatory modeling. Moreover, we limit our focus on the (S)RMR, the NFI and the GFI, as the principles of these fit indices are not tied to the asymptotic properties of a specific estimator. Although we have shown that in principle, the NFI and the GFI can detect misspecified composite models, the SEM literature has shown that they are affected by the sample size and model complexity Bentler, 1998, 1999;Sharma et al., 2005). Therefore, alternatives such as the NNFI (Bentler and Bonett, 1980) have been proposed. In this regard, we recommend investigating whether the principles underlying the NNFI, and similarly the RMSEA, also apply to composite models estimated by PLS-PM. Furthermore, our simulation study showed that the fit measures assess misspecifications differently. Therefore, future research should identify situations in which a specific fit measure is preferred.
Similarly, our proposed guidelines are limited to linear and recursive models estimated by PLS-PM. The limitation to PLS-PM is in no way mandatory, and other estimators that produce consistent estimates for composite models are valid alternatives. Moreover, in empirical research, scientists encounter situations where models are non-recursive, e.g. the models contain feedback loops (Dijkstra, 2017). It is noted that the presented guidelines can still be applied for this type of model. Additionally, non-recursive models often provide the opportunity to exploit the involved overidentification restriction through statistical tests such as the Sargan-Hansen test (Sargan, 1958) to investigate whether the postulated assumptions required for identification hold.
In SEM, the issue of equivalent models is well known in the literature (Raykov and Penev, 1999) and often encountered in empirical research (MacCallum et al., 1993). Equivalent models exhibit identical levels of model fit, i.e. they all produce the same modelimplied variance-covariance matrix even when the model parameter estimates differ (Raykov and Penev, 1999). Consequently, the overall model fit assessment cannot help identify the correct model among all equivalent models. It is obvious that the issue of equivalent models is not specific to latent variable models but also applies to composite models. Hence, to validate a model, a researcher needs to argue why his/her model should not be rejected in favor of an equivalent model (Kline, 2015). EJM Notes 1. The notion of an emergent variable is used to emphasize that the composite conveys all the information between its antecedents and its consequences and that it is on the same level as a latent variable. Moreover, emergent variables composed of latent variables, emergent variables, or a mixture of both are conceivable (Van Riel et al., 2017;Schuberth et al., 2020). However, in this article, we focus on emergent variables made up of observed variables.
2. Only recently, it was shown that a special type of composite model in which the emergent variables are composed of correlation weights can be consistently estimated by PLS-PM Mode A (Cho and Choi, 2020). This type of composite model is a special case of the composite model presented by Dijkstra (2017) and Schuberth et al. (2018) and can also be consistently estimated by PLS-PM Mode B.
3. It is emphasized that we do not refer to the measurement evaluation steps known from PLS-SEM, which have also been recently dubbed as confirmatory composite analysis . For a comparison of the two, we refer to Schuberth (2020).