Fiss (2007) and Meyer, Tsui, and Hinings (1993) suggest that many of the problems in empirical research on organizational configurations derive from a mismatch between methods and theory. Configurational theory suggests a clean break with the predominant linear paradigm. Rather than implying singular causation and linear relationships, a configurational approach assumes complex causality and nonlinear relationships where “variables found to be causally related in one configuration may be unrelated or even inversely related in another” (Meyer et al., 1993, p. 117). The linear paradigm relies on null hypothesis significance test (NHST) – proposing alternative hypotheses that directional relationships exist (e.g., increases in variable X associates with increases in variably Y).

Are Fiss (2007) and Meyer et al. (1993) correct? If they are correct, why do almost all empirical studies that report forecasting models rely on using symmetric tests (e.g., correlation, multiple regression analysis (MRA), and structural equation modeling (SEM)) and reporting NHST findings? If such analytical tools as symmetric tests and NHST represent bad science, what alternative data analytical tools should researchers use? This volume in Advances in Business Marketing & Purchasing answers these questions and provides examples of using configurational modeling that are somewhat precise outcome tests (SPOTs). The volume in your hands or on your screen suggests that you stop using NHST and symmetric tests such as correlation, MRA, and SEM. The chapters in this volume describe complexity theory tenets and provide examples mostly from the business-to-business strategy, marketing, and purchasing literatures on why and how to build asymmetric models using configurations of antecedent conditions.

Yes, Fiss (2007) and Meyer et al. (1993) are correct on all counts. Additional researchers (Armstrong, 2012; Gigerenzer, 1991; Hubbard, 2016; Ziliak & McCloskey, 2008) – who have carefully reviewed and documented research tools on the usefulness of data analytic methods – reach the same or similar conclusions. Using symmetric tests such as correlation analysis, MRA, and SEM misrepresents the information quality and quantity that a researcher can mine from a data set. Usually the decision by researchers to use symmetric tools and NHST is done automatically, without explicit thinking about the availability and usefulness of asymmetric tools and SPOT. Most researchers propose theories in strategic management, finance, organizational behavior, marketing, and management at the case level but then do symmetric tests on the basis of relationships among variables. Offering case theory and doing variable-relationship testing is the mismatch that Fiss (2007) and Meyer et al. (1993) identify. Both the construction and testing of theory at the case level using asymmetric tests of configurational casual statements is possible and several examples are available (e.g., Frösén, Luoma, Jaakkola, Tikkanen, & Aspara, 2016; McClelland, 1998; Ordanini, Parasuraman, & Rubera, 2014; Wu, Yeh, Huan, & Woodside, 2014).

The dominant practice in the teen years of the 21st century in constructing forecasting models relating to strategic management is to perform MRA and SEM and test resulting models for fit of the predictions of the observations for a dependent variable. However, “Achieving a good fit to observations does not necessarily mean we have found a good model, and choosing the model with the best fit is likely to result in poor predictions. Despite this, Roberts and Pashler (2000) estimated that, in psychology alone, the number of articles relying on a good fit as the only indication of a good model runs into the thousands” (Gigerenzer & Brighton, 2009, p. 118). These studies are examples of shallow analysis that are accurately describable as examples of the rubbish that saddens McCloskey (2002).

The editor-in-chief of at least one journal, Basic and Applied Social Psychology, has now banned the practice of reporting NHST findings as well as confidence intervals from future articles accepted for publication. The NHST and confidence intervals ban announcement by Trafimow and Marks (2015) confirms Hubbard’s (2016) and Ziliak and McCloskey’s troublemaker status in attempting to overthrow bad with good science. This action exemplifies Gigerenzer’s (2004, p. 604) call for courage, “To stop the [NHST] ritual, we also need more guts and nerves. We need some pounds of courage to cease playing along in this embarrassing game. This may cause friction with editors and colleagues, but it will in the end help them to enter the dawn of statistical thinking.” NHST and fit testing-only of regression models are the pervasive practices in articles appearing in all elite and otherwise ranked journals in management and marketing today. As Hubbard (2016) documented, such corrupt theory construction and testing has dominated these literature streams since the early 1960s. Gigerenzer (2008, p. 170) explained that these practices are procedures of bad science, “Statistical packages allow every difference, interaction, or correlation against chance to be tested.” They automatically deliver ratings of “significance” in terms of stars, double stars, and triple stars, encouraging the bad after-the-fact habit. The general problem Feynman (1998) addressed is known as overfitting. Fitting a model to data that is already obtained is not sound hypothesis testing, even if the resulting explained variance, or R2, is impressive. The reason is that one does not know how much noise one has fitted, and the more adjustable parameters one has, the more noise one can fit. Psychologists habitually fit rather than predict, and rarely test a model on new data, such as by cross-validation (Roberts & Pashler, 2000). Fitting per se has the same problems as storytelling after the fact, which leads to a “hindsight bias (Hoffrage, Hertwig, & Gigerenzer, 2000).”

Symmetric testing of statistical significance of directional hypotheses is pervasive in the literature of business marketing and purchasing. Unfortunately, the evidence is abundant that the dominant logic of symmetric testing of directional hypothesis is bad practice and contributes to bad science. Symmetric tests include correlation analysis, the F-test, MRA, and SEM. Researchers perform symmetric tests in most instances with the hope of rejecting null hypotheses. The null hypotheses is a prediction that the relationship between two variables (X and Y) is statistically equal to zero or that the behavior of firms or customers in group A versus group B have beliefs, attitude, and behaviors equal to zero. Tools for symmetric testing appearing in most articles in today’s leading scholarly journals of finance, management, marketing, and psychology include computing correlations (r’s) and b coefficients in multiple regression analyses. The NHST examines whether or not an observed r or b coefficient differs from zero to such an extent that the observed difference is unlikely to have occurred by chance alone (p < .05 or p < .01). The p < .05 indicates that the observed finding would occur less than five times in one hundred if the analysis was done 100 times using the same data collection instruments on separate samples.

Critics of the use of NHSTs describe the severe limitations of NHST. One criticism is that all observed findings in NHST differ statistically different from zero if the sample of cases is very large (n > 5,000). Second, the study of which variables are statistically different from zero and which variable measurements do not differ from zero does not provide information on which configurations of conditions are present that indicate the consistent occurrence of a specific outcome. The primary research focus needs to be on identifying the configurations of ingredients that accurately and consistently predict high performance (or low performance). In reading the chapters in the present ABMP volume, the reader learns how to construct theories of complex configurations of conditions that are sufficient in identifying specific outcomes consistently. “Consistently” refers to the model accurately predicting the same outcome frequently with few, if any, false positives, when testing the model on cases from new samples. Note in reading that a trade-off occurs between accuracy and coverage. Models highly accurate (prediction odds 10 correct to 1 incorrect cases) in forecasting specific cases usually has a greater number of conditions than the accuracy of a simpler model (i.e., a complex statement includes only three conditions that achieves an accuracy of four to one correct to mistaken case identifications). The hope is that the reader reaching the final sentence of this preface is intrigued sufficiently to read the first chapter and then the rest of the volume. Good reading!

Arch G. Woodside



