The APP procedure for estimating the Cohen ’ s effect size

Purpose – Theauthorsderivethenecessarymathematics,providecomputersimulations,providelinkstofree and user-friendly computer programs, and analyze real data sets. Design/methodology/approach – Cohen ’ s d , which indexes the difference in means in standard deviation units,isthemostpopulareffectsizemeasureinthesocialsciencesandeconomics.Notsurprisingly,researchershavedevelopedstatisticalproceduresforestimatingsamplesizesneededtohaveadesirableprobabilityofrejectingthenullhypothesisgivenassumedvaluesforCohen ’ s d ,orforestimatingsamplesizesneededtohave a desirable probability of obtaining a confidence interval of a specified width. However, for researchers interested in using the sample Cohen ’ s d to estimate the population value, these are insufficient. Therefore, it wouldbe usefultohave a procedureforobtainingsample sizesneededto beconfidentthatthe sample.Cohen ’ s d to be obtained is close to the population parameter the researcher wishes to estimate, an expansion of the a priori procedure (APP). The authors derive the necessary mathematics, provide computer simulations and linkstofreeanduser-friendlycomputerprograms,andanalyzerealdatasetsforillustrationofourmainresults. Findings – Inthispaper,theauthorsansweredthefollowingtwoquestions:Theprecisionquestion:Howclose do I want my sample Cohen ’ s d to be to the population value? The confidence question: What probability do I want to have of being within the specified distance? Originality/value – To the bestofthe authors ’ knowledge, thisisthe firstpaperfor estimatingCohen ’ s effect size, using the APP method. It is convenient for researchers and practitioners to use the online computing packages.


Introduction
famously argued that researchers should be concerned not only with whether an effect is present but with the size of the effect too. Cohen discussed a variety of different effect size indices, and other researchers have added to the effect size toolbox. Nevertheless, for typical studies, where economic data for two groups are compared, economic data for two countries are compared, or where an experimental group is compared Cohen's effect size to a control group, Cohen's d remains by far the most popular effect size index. As will be explained in more detail in the subsequent section, Cohen's d denotes the difference in means divided by the standard deviation. Thus, Cohen's d provides valuable information about how much the means differ in standard deviation units. One reason that scientists in the social and economic sciences have found Cohen's d useful is that many of the dependent measures these sciences do not have intrinsic meaning. For instance, whereas it might be reasonably clear what a dollar means, the meaning of a unit on an attitude scale might be less clear. If the mean attitude in the experimental condition is 2 and the mean attitude in the control condition is 1, is this a small difference or a large one? By converting the difference in means to Cohen's d, the researcher can gain an idea of the size of the difference in standard deviation units, even when the scale units are not themselves intrinsically meaningful. Another advantage of Cohen's d is that it facilitates comparisons across studies. Even if scale units are different for different studies, thereby rendering them seemingly impossible to compare, researchers still can compare in terms of standard deviation units. Many researchers have taken advantage of this, particularly in meta-analytic research. Despite the popularity of Cohen's d and its obvious usefulness, there remains an important limitation. Specifically, the Cohen's d that a researcher obtains in a particular experiment is a sample statistic. It is not a population value. Typically, researchers are not interested in sample statistics for their own sake, but because they provide useful estimates of population values. Thus, there is an important question that has not been properly addressed: how well does Cohen's d estimate the population effect size? Although researchers have long known how to compute traditional confidence intervals for Cohen's d, traditional confidence intervals do not properly address the question. This is because, for example, although 95% of 95% confidence intervals surround the population parameter, it is not the case that the population parameter has a 95% chance of being within a 95% confidence interval. This last is unknown. In addition, Trafimow and Uhalt (2020) have shown that sample confidence intervals tend to be inaccurate representations of population confidence intervals unless sample sizes are much larger than those typically employed. An alternative way to address the issue is to use the a priori procedure (APP) that has been employed previously in a variety of ways not pertaining to Cohen's d (e.g. Li et al., 2020;Trafimow, 2017Trafimow, , 2019Trafimow and MacDonald, 2017;Trafimow et al., 2020a;Wang et al., 2020Wei et al., 2020). Although the APP uses confidence intervals, it does so in a way that deviates importantly from traditional confidence intervals. To use APP thinking to address Cohen's d, the researcher would ask the two bullet-pointed questions below.
RQ1. The precision question: How close do I want my sample Cohen's d to be to the population value?
RQ2. The confidence question: What probability do I want to have of being within the specified distance?
For example, the research might wish to have a 95% probability of obtaining Cohen's d within a tenth of a standard deviation of the population value. The present goal is to determine the sample size the researcher needs to collect to meet the precision and confidence specifications in the contexts of independent and matched samples experimental designs. This paper is organized as follows. Definitions of Cohen's effect sizes for both populations and samples are given in Section 2, together with properties of noncentral t distributions. In Section 3, the APP methods are applied for estimating population effect size θ in independent case and θ D in dependent case. In Section 4, the simulation study, the coverage rate, and real data examples are provided in supporting our main results given in Section 3. Conclusion remarks are given in Sections 5.

Preliminaries
Effect size is a statistical concept that measures the strength of the relationship between two variables on a numeric scale. For example, in medical education research studies that compare different educational interventions, effect size is the magnitude of the difference between groups. The absolute effect size is the difference between the average, or mean, outcomes in two different intervention groups. The standard deviation of the effect size is of critical importance, since it indicates how much uncertainty is included in the measurement. For more details and applications, see, Sullivan and Feinn (2012), Schafer and Schwarz (2019), Bhandari (2020).
Cohen's d is one of the most common ways to measure effect size, which is known as the difference of two population means and it is divided by the standard deviation from the data. Mathematically, Cohen's effect size is denoted by: where μ 1 and μ 2 are means of two populations, and σ is the standard deviation based on either or both populations. Cohen's d is defined as the difference between two means divided by a standard deviation for the data obtained from both populations: where X 1 and X 2 are sample means and S, defined by Jacob Cohen, is the pooled standard deviation (for two independent samples) n 1 þ n 2 À 2 s ; and n 1 , S 1 , n 2 , S 2 are sample sizes and sample variances of two independent samples, respectively.
Note that confidence intervals of standardized effect sizes, especially Cohen's d, rely on the calculation of confidence intervals of noncentrality parameters. In order to find the minimum sample sizes for estimating the Cohen's effect size θ given in (2.1) by Cohen's d given in (2.2), we need the following definition.
Definition 2.1. Let Z and U be independent random variables, Z ∼ N(λ, 1), the normal distribution with mean λ and standard deviation 1, and U ∼ χ 2 m , the chisquare distribution with m degrees of freedom. The random variable T given by is said to have a noncentral t-distribution with m degrees of freedom and the noncentrality parameter λ, denoted by T ∼ t m (λ).
It is easy to obtain the following properties of T ∼ t m (λ) (see Nguyen and Wang, 2008).
(1) The probability density function (pdf) of T is given by Cohen's effect size (2) The mean and variance of T are ; for m > 1 and respectively. For convenience, if we use the correction factor J(m) given by ; (2.4) then the mean and the variance of T are EðTÞ ¼ J ðmÞ λ; for m > 1; (2.5) and VarðTÞ We will use the following results in the proofs of our main results to be given in next section.
Proposition 2.2. Let ðX 11 ; X 21 Þ 0 ; . . . ; ðX 1n ; X 2n Þ 0 be a random sample of size n from a bivariate normal population with mean vector μ and covariance matrix Σ, Let D i 5 X 1i À X 2i , i 5 1, . . ., n and θ D ¼ μ 1 − μ 2 σ and d D ¼ D S D be the Cohen's sizes of the population and matched sample, respectively, where are the mean and variance of D i 's, respectively. Then and D and S 2 D are independent. Therefore, by Definition 2.1, so that the desired result follows. , Remark 2.1. The graphs of density curves of t m (λ) with different mean values J(m)λ and different degrees of freedom m are given in Figures 1 and 2, respectively. From graphs, we know that (i) density curves are symmetric about mean J(m)λ, which is a function of m, so that the equal tailed confidence intervals should be the best choice, and (ii) density curves tend to N(λ, 1) as m to ∞. Remark 2.2. For a given the confidence level c 5 1 À α, the c 100% confidence intervals of λ based on T 1 with m 1 degrees of freedom and T 2 with m 2 with m 1 < m 2 degrees of freedom have the following relationship: where t m,(1Àc)/2 is the critical value of the t-distribution with m degrees of freedom. More details for the confidence interval of Cohen's effect size θ will be given in next section.
3. The APP methods for estimating both θ and θ D In this section, we will apply the APP methods for estimating Cohen's effect size θ in independent samples' case and θ D in matched sample case.

The minimum sample size required for estimating θ
First, we consider two independent samples from two normal populations N ðμ 1 ; σ 2 1 Þ and N ðμ 2 ; σ 2 2 Þ with equal unknown variances: σ 2 1 ¼ σ 2 2 ¼ σ 2 . Theorem 3.1. Let X 11 ; . . . ; X 1n 1 be a random sample of size n 1 from N(μ 1 , σ 2 ), X 21 ; . . . ; X 2n 2 be a random sample of size n 2 from N(μ 2 , σ 2 ). Assume that two samples are independent. Let c be the confidence level and f be the precision which satisfies J(n 1 þ n 2 À 2) is the correction factor given in (2.4). Both θ and d are given in (2.1) and (2.2). Let n 5 min{n 1 , n 2 } and f T ($) be the density of t-distribution with degrees of freedom 2(n À 1) and noncentrality parameter λ * ¼ ffiffiffiffiffiffiffi ffi n=2 p θ. Then the required sample size n can be obtained by solving where d * ¼ ðX 1 * − X 2 * Þ=S * , X 1 * , S 2 1 * and X 2 * , S 2 2 * are sample means and variances of independent samples with same sample size n, respectively. Here S 2 , which has density given by Proof. By Proposition 2.1, we know that T 1 ¼ ffiffiffiffiffiffiffiffiffi n 1 n 2 n 1 þn 2 q d ∼ t n 1 þn 2 −2 ðλÞ, where λ ¼ ffiffiffiffiffiffiffiffiffi n 1 n 2 n 1 þn 2 q θ. Thus, the mean and variance of d, are given, respectively, by and Now it is easy to obtain the density of H ¼ d − EðdÞ σ d , which is symmetric about 0 and given by where f T 1 is the density of T 1 . Note that if we have two independent random samples of sizes n 1 and n 2 , we can construct the 100c% confidence interval for θ based on (3.7). Now, let S , which has a noncentral t distribution with n 1 þ n 2 À 2 degrees of freedom and noncentrality parameter θ= ffiffi ffi 2 p . Thus, the unbiased estimator of θ is ffiffi ffi 2 p T 2 . Note that the variance of d 1 is Cohen's effect size σ 2 d1 ¼ 2 V ðT 2 Þ ¼ n 1 þ n 2 À 2 n 1 þ n 2 À 4 2 þ θ 2 À Á À ½J ðn 1 þ n 2 À 2Þ 2 θ 2 : (3.8) Therefore, we can setup the confidence interval for given confidence level c and precision f, which is given in (3.1). Since there are two unknowns n 1 and n 2 , there are no solutions using Equation (3.1) so we need to modify this equation. Let n 5 min{n 1 , n 2 } so that the degrees of freedom n 1 þ n 2 À 2 ≥ 2n À 2. Suppose that we have two independent samples of size n in both; then, S 2 * ¼ S 2 1 þS 2 2 2 , and the distribution of d * ¼ ðX 1 − X 2 Þ=S * is the noncentral t with 2(n À 1) degrees of freedom and the noncentrality parameter λ * ¼ n 2 θ so that its mean and variance are Thus, the standardized random variable H * ¼ d * − Eðd * Þ S * has the density given in Equation (3.3). Therefore, the required n can be obtained by solving Equation (3.2). Similarly to Remark 2.2, we know that H 2ðn−1Þ;ð1−cÞ=2 ≥ H n 1 þn 2 −2;ð1−cÞ=2 , where H m,(1Àc)/2 is the critical value of the distribution H. It is easy to see that so that the desired results follows. , Remark 3.1. The required n obtained in Theorem 3.1 is unique. Also, if the conditions in Theorem 3.1 are satisfied, we can construct a c 3 100% confidence interval (3.10) where σ d 1 * is given in (3.4) and θ 5 θ 0 , which can be obtained from the previous data, otherwise the default θ 5 0.
Remark 3.2. In order to see that Equation (3.9) holds numerically, we provide probabilities of the c 5 95% confidence intervals for different sample sizes n 1 and n 2 , which is given in Probabilities of the c 5 95% confidence intervals for different sample sizes n 1 and n 2 AJEB Remark 3.3. Researchers can access at the following website https://appcohensd. shinyapps.io/independent/ to obtain the required sample size. The input variables are the value of θ 0 from the previous data by previous data or θ 0 5 0, precision f, and confidence level c. For convenience, the output variable is the required sample size n 5 min{n 1 , n 2 }. The required sample sizes for different values of precision f 5 0.1, 0.15, 0.2, 0.25, confidence levels c 5 0.95, 0.90 and θ 0 5 0, 0.1, . . ., 1 for independent case are given in Table 2. The relationship between required sample n and parameter θ for different values of precision f is given in Figure 3.   Cohen's effect size

The minimum sample size of Cohen's d needed for a given sampling precision in matched samples
Theorem 3.2. Let ðX 11 ; X 21 Þ 0 ; . . . ; ðX 1n ; X 2n Þ 0 be a random sample of size n from a bivariate normal population with mean vector μ and covariance matrix Σ, where μ ¼ μ 1 μ 2 and Σ ¼ σ 2 1 ρ ρ 1 : Denote D i 5 X 1i À X 2i , i 5 1, . . ., n. Let θ D ¼ μ 1 − μ 2 σ and d D ¼ D S D be the Cohen's effect sizes of the population and matched sample, respectively, where are the mean and variance of D i 's, respectively. The density function of ffiffiffi n p d D is f T , the density noncentral t-distribution with n À 1 degrees of freedom and noncentrality parameter λ D ¼ ffiffiffiffiffiffiffiffiffiffi ffi n 2ð1 − ρÞ q θ D . Also, the mean and variance of d D are given by Let c be the confidence level and f be the precision which satisfies Then the required sample size n can be obtained by where f H is the density of H ¼ ½d D − Eðd D Þ=σ d D given by (3.14) Proof. By Proposition 2.2, we know that T ¼ ffiffiffi

AJEB
It is easy to see that the mean and variance of d D are given by Thus, the density of H is given in Equation (3.13). Now, we know that d 1 From the distribution of H, we can obtain the required sample size n which is given in Equations (3.12) -(3.14) so that the desired result follows. , Remark 3.4. The value of n obtained is unique with f. Also, if the conditions in Theorem 3.2 are satisfied, we can construct a c 3 100% confidence interval for θ D given by where θ D ¼ θ D 0 , which can be obtained from the previous data, otherwise the default θ D 5 0.
Remark 3.5. Researchers can access at the following website: https://appcohensd. shinyapps.io/matched/ The input variables are the confidence level c, precision f, correlation coefficient ρ and θ D ¼ θ D 0 obtained from previous data information. The default value of θ D 5 0. The output is the desired sample size n required. Table 3 provides the n for c 5 0.90, 0.95 for different values of f and different values of θ D 0 's. Also the relationship between n and θ D is given in Figure 4.

Simulation and real data examples
In this section we will provide some simulation results and two real data examples to support our main results. Based on M 5 100,000 runs, coverage rates of the confidence intervals and the corresponding point estimates of θ in independent case and θ D for matched data are given in Tables 4 and 5. From these two tables, we can see that the performances of our APP procedures work very well and the biases are really small.
To evaluate our results, we provide a real data example for independent and match sample cases, respectively. The estimated distributions based on the data set are approximately N(8.0868, 1.1099 2 ) for population 1 and N(9.5273, 1.4115 2 ) for population 2 (with unit $10,000) (see Figures 5 and 7). The Q-Q plots of the data sets are given in Figures 6 and 8, showing that the scatters lie close to the line with no obvious pattern coming away from the line in both Q-Q plots. Now we consider the 95% confidence interval of θ with precision f 5 0.25 and default θ 5 0.  For matched case, we consider the data set named "Rugby" from Pakage "PairedDate" in R by Champely (2018). This data set provides the ratings on a continuous ten-point scale of two experts about 93 actions during several rugby union matches. Let D be the difference between ratings of two experts. The histogram and estimated density curve of D are given in Figure 9. From the data, we obtain D ¼ −0:3011 and S D 5 1.4872. The pattern of the points of the scatter plot shown in Figure 10 shows a positive

Conclusion remarks
Our goal was to derive ways to perform the APP with respect to Cohen's d for independent and matched samples. The present mathematics provide those derivations. In turn, computer simulations support the mathematical derivations. We also provide links to free and userfriendly programs to facilitate researchers performing the APP to determine sample sizes to meet their specifications for precision and confidence. An advantage of the programs is that even researchers who are unsophisticated in mathematics nevertheless can avail themselves of APP advantages.
In addition to the obvious benefit of aiding researchers who wish to compute Cohen's d determine the samples sizes they need the present mathematics provide an additional benefit. Specifically, the famous article in Science by the Open Science Collaboratio (2015) included replications of studies in top psychology journals. They found that the average effect size in the replication cohort of studies was less than half that in original cohort of studies. Thus, effect sizes tend not to replicate across study cohorts. Our suspicion is that one reason for irreproducibility is that sample sizes are too small and traditional power analyses are insufficient because they do not address the precision issue (Trafimow and My€ uz, 2019;Trafimow et al., 2020b), though significance testing doubtless plays a role too. The present mathematics, along with the links to computer programs, provide a solution. We hope and expect that researchers who wish to use Cohen's d to index their effect sizes will be better able to determine appropriate sample sizes, and thereby increase reproducibility in the social sciences.