lib.ugent.be€¦ · 2 introduction 2.1 effect size - background knowledge effect size measures are...
TRANSCRIPT
!
Benchmarking effect size measures for comparing the difference
between two independent group means
Limin Liu
Master dissertation submitted
to obtain the degree of
Master of Statistical Data Analysis
Promoter: Prof. Dr. Christophe Ley
Co-promoter: Prof. Dr. Christophe Leys
Tutor: Marie Delacre
Department name of the promoter:
Department of Applied Mathematics,
Computer Science and Statistics
!
Benchmarking effect size measures for comparing the difference
between two independent group means
Limin Liu
Master dissertation submitted
to obtain the degree of
Master of Statistical Data Analysis
Promoter: Prof. Dr. Christophe Ley
Co-promoter: Prof. Dr. Christophe Leys
Tutor: Marie Delacre
Department name of the promoter:
Department of Applied Mathematics,
Computer Science and Statistics
The author and the promoter give permission to consult this master dissertation and to copy it
or parts of it for personal use. Each other use falls under the restrictions of the copyright, in
particular concerning the obligation to mention explicitly the source when using results of this
master dissertation.
FOREWORD
Reform of statistical practice in the social and behavioural sciences requires wider use of
effect size measures and the associated confidence intervals (CIs). However, the choice of
effect size measures and approaches to build the CIs depends on the specific conditions. This
is complex and challenged in practice. The present work is to provide a guideline to report the
appropriate effect size estimator and the associated CI under the certain context of
comparisons between two independent group means.
The development of this work has greatly benefited from the generous help of my promotor,
Prof. Christophe Ley. The core concepts discussed in this thesis such as bias, consistency,
type I error, power, and so on were originally taught in his course Statistical Inference. The
academic training through his course equipped me with a firm foundation for conducting
statistic analysis. I especially appreciate his remarks to the draft, as well as his great support
with incisive explanations, encouragements and inspirations throughout my master study. My
co-promotor, Prof. Christophe Leys (Université libre de Bruxelles) is the initiator of this
research project. I am grateful to his guidance, instructions as well as the valuable comments
from the psychological perspective to the draft. I am also grateful to my tutor Marie Declare
(Université libre de Bruxelles) for lots of hours-long discussions on aspects of effect size
indices, type I error rate, Welch’s t-test, and her inputs on the draft. Her previous work on the
comparison between the two t-tests enlightened this study. Undoubtedly, this work is a joint
effort of psychologists and statistician. I would also like to thank my colleague, Emma, for
every great job we accomplished together and countless fulfilling moments in our team
adventure of statistics. I am appreciated for the opportunity to follow this master program and
many thanks to all the teachers for the numerous inspiring moments during the last three
semesters.
Last but definitely not least, my deepest appreciation to Xi, Isabella and Alexander. This work
would never be possibly accomplished without their generosity, understanding and support.
TABLE OF CONTENTS
1 Abstract 1
2 Introduction 2
2.1 Effect size - background knowledge 2
2.2 Effect size indices to compare two independent group means 3
2.2.1 Under the assumptions of normality and homogeneity of variances 3
2.2.2 Alternatives to Cohen’s under heteroscedasticity 5
2.2.3 Alternatives to Cohen’s ds under non-normality 6
2.3 Confidence intervals of effect size 8
2.4 Effect size, significant test and power analysis 9
2.5 Objectives 10
3 Method 11
3.1 Key statistics for evaluation of effect size estimators and confidence intervals 11
3.1.1 Bias rate, variance and consistency of the point estimators 11
3.1.2 Confidence intervals 12
3.1.3 Type I error and power for the statistical tests 17
3.1.4 Summary 17
3.2 Methods of the Monte Carlo simulations 19
3.2.1 Simulation design for the study of point estimators 19
3.2.2 Simulation design for the study of confidence intervals 20
3.2.3 Simulation design for the study of hypothesis tests 21
4 Results 22
4.1 Bias, consistency and variance of the point estimators 22
4.1.1 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s under Assumption 1 22
4.1.2 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s d under Assumption 2 23
4.1.3 Cohen’s ds, Hedge’s gs, Shieh’s d, dMAD, dR and PSindep under Assumptions 3 and 4 24
4.2 Accuracy and precision of confidence intervals 26
4.3 Statistical tests to compare two groups 30
4.3.1 Type I error rates 30
4.3.2 Power analysis 34
5 Discussion 39
6 References 42
1 Abstract
The choice of effect size measures and approaches to build the associated confidence intervals
is highly dependent on specific conditions. However, limited studies properly report effect
size in the last several decades. Therefore, it is of great relevance to provide a guideline to
researchers to choose proper effect size measures and their confidence intervals. In the
context of group comparisons, Cohen’s ! that relies on the often untenable assumptions of
normality and homogeneity of variances is the dominant effect size measure used by
researchers. However, the consequence of arbitrary use of Cohen’s ! when the assumptions
are violated and the superiority of alternatives to Cohen’s ! under specific conditions remain
vague. This study conducted comprehensive Monte Carlo simulations to compare Cohen’s !
and its alternatives under specific conditions across all assumptions regarding the point
estimators, confidence intervals and related statistical tests. Results show that Shieh’s d is
mostly preferred as a default effect size measure across all assumptions because it always
provides comparable lower bias rates and maintains higher precision compared to Cohen’s ! ,
with the only exception under heteroscedasticity when the group with less observations has
larger variances. Furthermore, we analysed four confidence intervals constructed through
noncentral t-distribution (NC) and Bootstrapping bias-corrected and accelerated strategy (BS
BCa) in terms of coverage probability and precision. NC interval around Shieh’s d is
considered as the optimal estimate under the assumption of normality, while BS BCa interval
around Cohen’s ! shows great advantages under non-parametric conditions especially when
the observations are adequate. Moreover, we compared two t-tests and observed that on one
hand, Welch’s t-test outperformed Student’s t-test in type I error rate with a cost of power. On
the other hand, Welch’s t-test required less observations than Student’s t-test to achieve the
same power in most conditions except when there are less observations in the group with
bigger standard deviation under the assumption of normality. This study contributes to the
effect size literatures by providing a guideline to choose the appropriate effect size estimator
and confidence interval approach to convey quantitative information in applied research.
ds
ds
ds
ds
ds
ds
! of !1 44
2 Introduction
2.1 Effect size - background knowledge
Effect size measures are one of the most important outcomes of empirical studies for three
reasons (Lakens, 2013): 1) they allow researchers to present the magnitude of effect in a
standardised manner and to communicate the practical significance of their results instead of
only reporting the statistical significance; 2) they allow researchers to draw meta-analytic
conclusions by comparing standardised effect sizes across studies; 3) the effect sizes from
previous studies can be used in a priori power analysis when planning a new study. Therefore,
understanding and distinguishing the difference of diverse effect size measures are crucial to
guide researchers to choose proper estimators in certain contexts.
Effect sizes express the magnitude of an effect. Formally, Grissom & Kim (2005) used the
term “effect size” to describe the degree to which results differ from what is implied for them
by a null hypothesis. Later, Kelly & Preacher (2012) defined effect size as a quantitative
reflection of the magnitude of some phenomenon that is used for the purpose of addressing a
question of interest. According to the general review of Ferguson (2009), effect sizes can be
categorised into four general classes associated to different research interests: 1) group
difference (known as Cohen’s d family); 2) strength of association (variance explained,
known as r family) and 3) the corrected estimates (adjusted ! , also belonging to r family, r
and ! are really measures of effect size on their own, due to the fact that r covers the whole
range of relationship strengths, from 0 to 1 or -1); 4) risk estimates (categorical association,
known as c family. The c family relates to the categorical effect sizes, including Phi (φ),
which is equivalent to the correlation coefficient r for the goodness of fit in 2x2 contingency
tables and Cramer’s V (φc) for contingency tables greater than a 2x2 design. These estimators
also give good norming from 0 to 1 regardless of table sizes).
Each category has its own calculating rules and assumptions, resulting in various versions of
estimators in response to the change of conditions. In order to evaluate the suitability of effect
size estimators, three important statistical properties that are suggested in the previous studies
are explored in this study, including unbiasedness, consistency, efficiency (Kelley & Preacher,
2012); in addition, interpretability (Cumming, 2013) is also discussed afterwards. Moreover, a
confidence interval around the effect size point estimator will also serve as a null hypothesis
statistical test and provide a far better understanding of the results than does a simple
R2
R2
! of !2 44
significance answer. Therefore, we are also motivated to inspect accuracy and precision
properties of different interval estimations. Finally, as effect size measures origin from
statistical tests, comparisons of type I error rates and powers of different tests also deserve our
attention.
2.2 Effect size indices to compare two independent group means
Our study focuses on the effect size measures in the context of between-group comparisons.
We first go over the effect size indices interpreted as the standardised means differences under
the assumptions of normality and homogeneity of variances. Next, we review the alternatives
when the assumptions might be either or both violated.
2.2.1 Under the assumptions of normality and homogeneity of variances
In the context of group means comparisons, the most commonly used and perhaps the most
intuitively appealing effect size is the standardised difference of an effect, typically termed
Cohen’s d (Cohen, 1988) which indicates the entire family of group difference effect sizes. In
association with different experimental designs, there are various versions of Cohen’s d
denoted by different subscripts (Cohen, 1988; Lakens, 2013). We limit our review to
“between-subject” designs where individuals are randomly assigned into groups under
different experimental conditions. The main task under this study design is to compare the
behaviour of those in one experimental condition with the behaviour of those in another
(Charness et al., 2012). Unpaired t-tests (or independent t-tests since the subjects in one group
are not related to those in another group) are usually applied to serve this type of experimental
design.
• Cohen’s ! - population effect size
The population-standardised difference between two group means is defined by Cohen (1985)
as:
! (1)
where both populations follow a normal distribution with the mean ! in the ! group (j =1, 2)
and the standard deviation ! .
• Cohen’s ! - estimator of Cohen’s !
In reality, it is seldom that researchers have the access to the population standard deviation ! .
Therefore, we normally calculate Cohen’s ! by using the sample estimates. Cohen refers to
δ
Cohen's δ =μ1 − μ2
σ
μj jth
σ
ds δ
σ
δ
! of !3 44
the standardized mean difference between two groups of independent observations for the
sample as ! which is given by:
! (2)
where the numerator is the difference between two group means from observations ( ! is the
sample mean from the ! group (j = 1, 2)); the denominator is the square root of the pooled
standard deviation assuming equal standard deviations in two populations.
• Hedges’ ! - bias corrected version of Cohen’s !
As Cohen’s ! is based on sample averages, not surprisingly, it will give a biased estimation of
the population effect size when the variety of data is reduced by limited observations (n < 20).
Evidences are shown in previous researches that Cohen’s ! tends to be larger than the
population value in the long run if it is estimated from a small sample (Lakens, 2013;
Cumming & Jageman, 2017). This over-estimation is due to a bias of SD, which tends to be
lower than the population’s SD. Because the mean is not biased, when divided by an under-
estimated SD, it leads to an over-estimated measure of Cohen’s ! . The recommended bias-
corrected version of Cohen’s ! is originally defined by Hedges and Olkin (1985), which is
calculated as:
! (3)
Obviously, as long as researchers report the number of participants in each condition for a
between-subjects comparison and the t-value, Cohen’s ! and Hedges’ ! can be calculated
when data are normally distributed with equal standard deviations in both parenting groups.
These effect size indices serve well under the assumptions of normality and homogeneity of
variance, but they may not be well advised for use with data that violates these assumptions
(Grissom & Kim, 2001; Kelley, 2005; Shieh, 2013). Actually, the presence of unequal
variances is a realistic assumption in psychological research and the non-normality is also not
rare in practice (Delacre et al., 2017). Therefore, if two treatments produce distributions with
different variances and/or different shapes, more power and informative alternatives should be
considered.
ds
Cohen's ds =X1 − X2
(n1 − 1)SD21 + (n2 − 1)SD2
2n1 + n2 − 2
Xj
jth
gs ds
ds
ds
δ
ds
Hedges's gs = Cohen's ds × (1 −3
4(n1 + n2) − 9)
ds gs
! of !4 44
2.2.2 Alternatives to Cohen’s ! under heteroscedasticity
• Glass’s !
Whenever standard deviations differ substantially between conditions, one well-known
solution is to apply the Glass’s (1976) formula with either one of the two sample standard
deviations as the standardizer to both standardised mean differences simultaneously, i.e., the
following applications can be given if we choose to use the standard deviation of the control
group:
! (4)
Hedges’ correct version of Glass’s ! can be obtained by replacing Cohen’s ! with Glass’s !
in formula (3).
• Shieh’s d
The second approach considers Welch’s statistic for the well-known Behrens-Fisher problem
of comparing the difference between two normal means that may have unequal population
variances (Kim & Cohen, 1998). To extend the notion of effect size within the heteroscedastic
framework, Shieh (2013), following Kulinskaya and Staudte (2007), defined the effect size
estimator Shieh’s ! :
! (5)
where ! , i = {1, 2} is the group size allocation ratio, and ! . This effect size
estimator is a function of mean difference, variance components and allocation ratios. Unlike
the homogeneous variance cases, this method takes account of the sample size allocation
ratios into the calculation of the group variance. It suggests that the standardised mean
difference needs to accommodate the design characteristic of group allocation scheme under
the more sophisticated situation of heteroscedasticity. According to the statistical properties of
Welch’s statistic under heteroscedasticity, it does not appear possible to define a proper
standardised effect size without accounting for the relative group size of subpopulations in a
sampling scheme. We consider Shieh’s d as a robust effect size measure under
heteroscedasticity and are motivated to investigate its behaviour when observations come
from normal populations with unequal variances or even from non-normal distributions.
ds
ds
Glass's ds =Xexperimental − Xcontrol
SDcontrol
ds ds ds
d
Shieh's d =X1 − X2
(SD21 /q1 + SD2
2 /q2)
qi = ni /N N = ∑ ni
! of !5 44
2.2.3 Alternatives to Cohen’s ds under non-normality
If the sample sizes are large and with equal variances, one can expect valid results from
Student’s t-tests and Cohen’s ! since they are robust to violations of normality when other
assumptions are met and the null hypothesis is true (Boneau 1960, in Kelley 2005). However,
these restricted conditions are not always met in practice. If the sample sizes are limited (e.g.,
less than 20 observations in each group) or two group means differ significantly, analysing
non-normal dataset by way of procedures that assume normality can have serious misleading
implications for the conclusions. Thus, researchers have been exploring appropriate
alternatives for estimating the group means differences when the assumption of normality is
violated. We review two type approaches here: 1) the parametric alternatives to Cohen’s ! by
performing t-tests (including Student’s t-test and Welch’s t-test); 2) the non-parametric
solutions. To avoid possible confusion in practice, two remarks on the non-parametric tests
have to be made: 1) the non-parametric tests do not have the same ! as t-tests since they
don’t assume equal group means but equal distributions; 2) the non-parametric tests provide
distinct advantages when the sample observations are extremely limited and/or the
assumption of normality is violated, but they are not guaranteed to be immune from
heteroscedasticity. In this study, we take the non-parametric effect size measure into account
as they are expected to provide standard references in the comparisons among various effect
size estimators especially under non-parametric conditions.
• Using trimmed medians and MAD
Hedges and Olkin (1985) proposed an effective solution to handle extremely skewed-
distributed datasets. It is suggested to trim the highest and lowest scores from both groups,
replace the mean difference with the median difference, and the standard deviation with MAD
(see detailed illustrations of the calculation of MAD in Leys et al., 2013). Thus, one
alternative version of equation (4) is given as:
! (6)
where ! and ! are the sample medians of the experimental and the control group.
Obviously, when dealing with a sufficiently heavy-tailed distribution where the number of
outliers tends to be large, comparing medians can result in larger power than methods based
on the means.
ds
ds
H0
dMAD =Mdnexp − Mdnctr
MADctr
Mdnexp Mdnctr
! of !6 44
• Using the trimmed mean and the winsorized variance
Another robust version of Cohen's ! is constructed by applying the 20% trimmed means and
the pooled 20% winsorized variance (Algina et al., 2005) which can be given as:
! (7)
where ! (j = 1, 2) is a 20% trimmed mean from the ! group (j = 1, 2) and ! is the square
root of the pooled 20% winsorized variance of two samples, that is,
! (8)
In formula (8), ! is the 20% winsorized standard deviation in ! group. Including 0.642 in
formula (7) ensures that ! when the data are drawn from normal distributions
with equal variances (see detailed descriptions of the calculation of the trimmed mean and the
winsorized variance in Algina et al., 2005). 20% is recommended as a common trimming
percentage based on the comprehensive considerations on the performance of removing
outliers, robustness to cases of heterogeneity and nonnormality, type I error control, and the
statistical power (see the justification of 20% trimming in Keselman et al., 2002).
• Mann-Whitney effect size
As previously mentioned, in case of independent group comparisons, classic nonparametric
tests such as the Mann-Whitney U-test are widely applied as the alternatives to t-tests without
assuming normality, although they do not exactly examine the same assumptions. For the
effect size estimator, Grissom and Kim (2012) suggest that one obtains the Mann-Whitney U
statistic and then divides it by the product of the two sample sizes:
! (9)
The U-value represents the number of times observations in group 1 precede observations in
group 2 in ranking. This effect size estimates the probability that a score randomly drawn
from group 1 will be greater than a score randomly drawn from group 2. Remarkably, similar
to the fact that the Mann-Whitney U test doesn’t compare the means of two groups, this effect
size measure isn’t a standardised mean difference index. On the other hand, note that Mann-
Whitney U test is not immune to heterosedasticity, this effect size estimator is not necessarily
to be a convincing option when both the assumptions of normality and homogeneity of
ds
dR = 0.642 × (Xt1 − Xt2
SDW)
Xtj jth SDW
SD2W =
(n1 − 1)SD2W1 + (n2 − 1)SD2
W2
n1 + n2 − 2
SDWj jth
δR = Cohen's δ
PSindep =U
n1n2
! of !7 44
variances are both violated. We will further explore the appropriateness of this effect size
measure across assumptions in the simulation study. We expect that this effect size indicator
would be a standard reference to other alternatives especially under non-normality and with
limited observations, since it is recommended effect size measure with no restrictions on
normality and sample sizes.
In summary, the proposed effect size measures to compare two independent groups are
integrated in Table 2.1. Briefly, Cohen’s ! is applied under the assumption of normality and
homogeneity of variances, while Glass’s ! and Shieh’s d are proposed when we cannot
assume equal variances in two groups. If the assumption of normality is violated, parametric
effect size indices such as ! and ! and non-parametric alternatives such as ! are
suggested with distinct advantages in dealing with specific situations.
2.3 Confidence intervals of effect size
The reporting of a confidence interval (CI) around a point estimate from a sample provides
complete information of the primary results of interest. Cumming & Finch (2001) highlighted
four reasons to use CIs: 1) They give point and interval information that is accessible and and
comprehensible and so they support substantive understanding and interpretation. 2) There is
a direct link between CIs and familiar null hypothesis significance testing (NHST): Noting
that an interval excludes a value is equivalent to rejecting a hypothesis that asserts that value
as true - at a significance level related to that critical value C. A CI may be regarded as the set
of hypothetical population values consistent, in this sense, with the data. 3) CIs are useful in
the cumulation of evidence over experiments: They support meta-analysis and meta-analytic
thinking focused on estimation. 4) CIs give information about the precision of the observed
estimate. They can be estimated before conducting an experiment and the width used to guide
the choice of design and sample size. After the experiment, they give information about
precision that may be more useful and accessible than a statistical power value.
ds
ds
dMAD dR PSindep
! of !8 44
Table 2.1 Summary of effect size estimators
Assumption Normality & Homogeneity
Homogeneity is violated
Normality is violated
Parametric Non-parametric
Effect sizeShihe's d =
X1 − X2
(SD21 /q1 + SD2
2 /q2)
Glass's ds =Xexperimental − Xcontrol
SDcontrol
dR = 0.642 × (Xt1 − Xt2
SDW)
dMAD =Md nexp − Md nctr
M A Dctr PSindep =U
n1n2Hedges's gs = Cohen's ds × (1 −3
4(n1 + n2) − 9)
Cohen's ds =X1 − X2
(n1 − 1)SD21 + (n2 − 1)SD2
2n1 + n2 − 2
In practice, there are mainly two approaches to construct CI around the point estimators of
effect size: the parametric method based on the noncentral t-distributions and the non-
parametric solution applying Bootstrapping procedures. Within the bootstrap framework, two
approaches to find the confidence limits values are delineated and they have different
statistical properties: 1) percentile method which is first-order accurate because the error of
the percentage of confidence interval coverage approaches zero at a rate related to
! (see detailed explanations in Kelly, 2005); 2) the bias-corrected and
accelerated (BCa) strategy which is second-order accurate because the over- or undercoverage
of the 100(1 – α)% BCa confidence interval approaches zero at a rate related to
! , which is smaller than that of bootstrapped percentile interval. Therefore, the
second type of bootstrap confidence interval is generally recommended and its advantage over
the first interval approach in terms of coverage probability and precision is evidence-based
convincing (Kelly, 2005). In out study, we decided to follow this BCa procedure to build the
non-parametric intervals for the comparisons between parametric and non-parametric
intervals. We will illustrate the concrete calculation steps in the Methods session.
2.4 Effect size, significant test and power analysis
Any statistical relationship in a sample can be interpreted into two ways: 1) the relationship in
the sample reflects a relationship in the population; 2) the relationship in the sample only
reflects the sampling error and there is actually no relationship in the population.
Null hypothesis testing is the formal approach to make the decision between these two
interpretations of a statistical relationship with the available observations. The idea that there
is no relationship in the population is usually interpreted into the null hypothesis (often
denoted as ! ), while the other interpretation is converted to the alternative hypothesis (often
denoted as ! or ! ). The crucial step of the null hypothesis testing is to find the likelihood of
the sample results under ! . This probability is called p-value. Reasonably, higher p-values
suggest a stronger support to ! and extremely low p-values do not give enough evidence to
accept ! . With any statistical test, however, there is always the possibility that we will find a
difference between two groups when it does not actually exists. This is called a Type I error
(denoted as ! ). Likewise, it is possible that when a difference does exist, the test will not be
able to identify it. This type of mistake is called a Type II error (denoted as ! ). The statistical
power (1- ! ) refers to the probability that the test will find a statistically significant difference
1/ min(n1, n2)
1/min(n1, n2)
H0
Ha H1
H0
H0
H0
α
β
β
! of !9 44
when such a difference actually exists. In other words, power is the probability that we reject
the null hypothesis correctly (and thus avoid a Type II error).
Effect sizes are always associated with inferential statistical tests. Once the test suggests a
significant effect, effect size is needed to be calculated to confirm if this statistically
significant effect is also practically meaningful or important in decision-making. Moreover,
effect size has an influence on the power of a test; in practice, effect size is used in power
analysis to determine the sample size. On the other hand, different tests may vary in the rates
of committing type I and type II errors under different assumptions. In the context of
comparisons of two independent groups, Student’s t-test and Welch’s t-test are most common
used as well as the alternative Mann-Whitney U test when data is non-normally distributed.
Detecting the difference among these tests will also enhance our understanding of effect size.
2.5 Objectives
In order to conduct a comprehensive study on effect size measures and confidence interval
approaches in the context of group comparisons, we designed the simulation study with three
main interests: 1) to observe the behaviour of Cohen’s ! and proposed alternatives under
different assumptions; 2) to detect the difference between the parametric and non-parametric
approaches for constructing the confidence intervals; 3) to explore the difference among
hypothesis tests for comparing two independent groups and its association with effect size. All
these processes serve to find the appropriate effect size measure(s) and associated CI
approach(s) under distinct assumptions.
ds
! of !10 44
3 Method
In this session, we first go over the key statistics that we defined and applied to examine the
statistical properties of the effect size point estimators and CIs. Then, we describe the study
design of the Monte Carlo simulations. Statistics and approaches that are more likely to differ
from the common practice or appear less familiar with scientific readers receive more
explanations in this session. On the other hand, the calculation of well-known statistics such
as mean, variances, standard deviations are not to be specified as we assume they are
undoubtable in the scientific society.
3.1 Key statistics for evaluation of effect size estimators and confidence intervals
3.1.1 Bias rate, variance and consistency of the point estimators
An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of
the sampling distribution of the estimator is equal to the true parameter value:
!
To gauge the bias of the sample estimators, we define the statistic bias rate as following:
! (10)
where E(d) is the averaged value of the effect size estimates generated by specific effect size
estimator (such as Cohen’s ! , Shieh’s d, Glass’s ! etc.); ! is the true effect size for the
population. In formula (10), the numerator captures the difference between the averaged
sampling estimators and the true parameter; the denominator is to standardise the difference
for the convenience of comparisons. The positive or negative sign of ! reflects the over-or
underestimation problem of the estimator; the absolute value of ! depicts the magnitude or
the severity of bias. In our study, we expect to find effect size estimators whose bias rates get
close to zero.
Besides the bias rate, another important statistic property of an estimator is the variance. As
bias suggests that the middle of the sampling distribution falls in line with the real parameter
is important (location-related), the variance of the estimator indicates that how concentrated
the estimates are around the real parameter is also important (shape-related). Sometimes, we
are willing to trade the location for a “better” shape, a tighter shape around the real unknown
parameter, so that we reduce the chance to be unluckily “miss” the real parameter. Therefore,
E(estimator) = parameter
rbias =E(d ) − δ
δ
ds ds δ
rbias
rbias
! of !11 44
we will take both the bias rate and the variance as important evaluation criterion in the
comparisons among various effect size estimators.
The third essential property if an estimator is consistency. An estimator is consistent if, as the
sample size increases, the estimates (produced by the estimator) “converge” to the true value
of the true parameter being estimated in probability. To be slightly more precise - consistency
means that, as the sample size increases, the sampling distribution of the estimator becomes
increasingly concentrated at the true parameter value. In our case, we expect to find unbiased
effect size estimators whose bias rates decrease with increasing sample sizes.
3.1.2 Confidence intervals
3.1.2.1 Four confidence intervals
To detect the difference between parametric and non-parametric interval estimation
approaches across conditions, we constructed two intervals around selected effect size point
estimators following each of these two approaches. Namely, non-central percentage
confidence intervals around Cohen’s ! and Shieh’s d, and the bootstrap bias-corrected and
accelerated intervals for Cohen’s ! and Hedge’s ! (R functions employed to perform the
suggested confidence intervals are available in Syntax 1 in appendix).
• NC t-distribution intervals in Student’s t-test with Cohen’s ds
Consider two independent random samples from two normal populations with means ! and
! , and standard deviations ! and ! , respectively. We wish to test the following hypothesis:
! : ! versus ! : !
When we assume homogeneity of variances, the test is based on the Student’s t-statistic:
! (11)
where ! , ! , ! , ! are the sample estimates of the unknown parameters ! , ! , ! , ! ,
respectively. When the null hypothesis is true, it follows a central t-distribution with
! degrees of freedom. However, when the null hypothesis is false, it then
follows a non-symmetric distribution that is known as a noncentral t-distribution with !
degrees of freedom and noncentrality parameter ! . The noncentrality parameter is a function
of Cohen’s ! and the sample sizes:
ds
ds gs
μ1
μ2 σ1 σ2
H0 μ1 = μ2 Ha μ1 ≠ μ2
T =X1 − X2
( 1n1
+ 1n2
)(n1 − 1)SD2
1 + (n2 − 1)SD22
n1 + n2 − 2
X1 X2 SD1 SD2 μ1 μ2 σ1 σ2
ν = n1 + n2 − 2
ν
λ
ds
! of !12 44
! (12)
! , the observed value of Student’s t-statisitc defined in formula (11), is used to estimate the
noncentrality parameter ! . By the confidence interval transformation principle (Cumming &
Finch, 2001), finding the confidence limits for ! leads to the confidence limits for Cohen’s ! .
In brief, we first need to find the lower and upper confidence limits for ! (noted as ! and ! ).
! is obtained by finding the noncentrality parameter whose 1- ! quantile is ! ; likewise,
! is obtained by finding the noncentrality parameter whose ! quantile is ! . These lower
and upper limits bracket ! with 100(1-! )% confidence. Once the confidence limits for ! have
been obtained, they can be transformed into confidence limits for Cohen’s ! by applying
formula (13). The confidence interval around Cohen’s ! is computed in the following
manner:
! (13)
Thus, given that the statistical assumptions are met, equation (13) provides the 100(1- ! )%
confidence limits around Cohen’s ! .
• NC intervals in Welch’s t-test with Shieh’s d
Welch’s approximate t procedure has been considered as a satisfactory and robust solution in
the two-sample t under the heterogeneous variances assumption (Delacre et al., 2017).
Welch’s statistic V is given by:
! (14)
As the exact distribution of Welch’s statistic V is comparatively complicated, the practical
importance and methodological complexity of the problem has led to numerous attempts to
develop variance procedures and algorithms for resolving the issue (Kim & Cohen, 1998, and
references therein). To accurate the corresponding interval estimation of the effect size within
the heteroscedastic framework, Shieh (2013) presented an alternative approach to construct
confidence intervals of the standardised mean difference. This approach firstly suggested that
with the same theoretical arguments and analytic derivations as in Welch, the statistic V has
the general approximate distribution:
λ = Cohen's dsn1n2
n1 + n2
Tobs
λ
λ ds
λ λL λU
λL α /2 Tobs
λU α /2 Tobs
λ α λ
ds
ds
Prob. λLn1 + n2
n1n2≤ δ ≤ λU
n1 + n2
n1n2= 1 − α
α
ds
V =X1 − X2
(SD21 /n1 + SD2 /n2)
! of !13 44
! (15)
where ! is a noncentral t-distribution with degrees of freedom ! and noncentrality
parameter ! (N is the total number of observations in two samples and ! is the
theoretical value of Shieh’s effect size index, see formula (5)). The degrees of freedom ! was
originally defined by Welch as:
! (16)
Since this ! depends on the unknown variances, an approximate version is suggested as:
! (17)
Hence, the adjustment gives the following modified distribution:
! (18)
The following step to find the confidence limits of the noncentrality parameter is similar to
the related step in the construction of NC intervals around Cohen’s , as previously
illustrated. The confidence limits for can be found by using the observed value of Welch
statistic V defined by formula (14). Once the confidence limits and have been obtained,
they can be transformed into confidence limits for Shieh’s ! by applying the following
formula:
! (19)
• Bootstrap Bias-corrected and accelerated (BS BCa) confidence intervals
The general bootstrap technique is a resampling procedure whereby random samples are
repeatedly drawn from the set of observed data a large number of times to study the
distribution of the statistic(s) of interest given the obtained data. In this study, our interest lies
in examining the distribution of the bootstrapped effect size point estimator d values
calculated from the random samples drawn from the observed data with replacement. An
important remark of this procedure is that it makes no assumption about the parent population
from which the data were drawn other than that the data are randomly sampled and thus
representative of the parent population.
V ∼ t (ν*, λ*)
t (ν*, λ*) ν*
λ* = N × δ* δ*
ν*
ν* = {σ21 /n1 + σ2
2 /n2}2
{σ21 /n1}2 /(n1 − 1) + {σ2
2 /n2}2 /(n2 − 1)
ν*
ν = {SD21 /n1 + SD2
2 /n2}2
{SD21 /n1}2 /(n1 − 1) + {SD2
2 /n2}2 /(n2 − 1)
V ∼ t ( ν, Nδ*)
λ*
ds
λ*
λ*L λ*U
δ*
Prob. [λ*L / N ≤ δ* ≤ λ*U / N] = 1 − α
! of !14 44
As explained in 2.3, we chose the bootstrap bias-corrected and accelerated confidence
intervals (BS BCa) to build the non-parametric intervals around associated effect size point
estimators. Briefly, the computation of the BCa proceeds in three steps (Figure 3.1).
As illustrated in Figure 3.1, we first repeatedly draw random samples with replacement from
the observed data for B times (B = 1,000 in our study) and calculate the effect size point
estimator d* for the bootstrap subsample within each time of the replication. Afterwards, a B
length vector of d* is then collected.
Next, we calculate the bias correction value ! and the acceleration parameter ! . ! is obtained
by calculating the proportion of the d* values that are less than the observed d (effect size
point estimator calculated from the observed data) and then finding the quantile from the
normal distribution with that cumulative probability:
! (20)
where ! is the standard normal cumulative distribution function and ! its inverse (e.g.,
! and ! ), and # is read as “the number of”. The acceleration
parameter ! is computed as follows:
! (21)
where ! is adapted value of d when the ! (i = 1, 2, … N, N is the total number of
observations in two groups) data point has been deleted (this strategy is also known as
jackknife resampling) and the ! is the mean of the N jackknifed ! values.
The last step is to find confidence interval quantiles by means of ! and ! , and then, the
corresponding values from the distribution of d*. Once ! and ! have been calculated, the
limits of the confidence interval are calculated by finding the values from the bootstrap
sample that correspond to the ! and ! quantiles of the observed bootstrap distribution.
The ! and ! values are found from the followingtions:
! (22)
and
z0 a z0
z0 = Φ−1( #(d* < d )B )
Φ Φ−1
Φ[1.96] = 0.975 Φ−1[0.95] = 1.645
a
a =∑
i = 1(d − d(−i))3
6((∑i = 1
(d − d(−i))2)3/2
)d(−i) ith
d d(−i)
z0 a
z0 a
CIL CIU
CIL CIU
CIL = Φ( z0 +z0 + zα/2
1 − α( z0 + z(α/2)) )
! of !15 44
! (23)
such that ! and ! represent the quantiles from the distribution of d*. That is, the
confidence limits from the BCa approach are obtained by finding the values from the
bootstrap distribution of d*, that correspond to the ! and ! cumulative probabilities.
Following this BCa approach, we built two confidence intervals around Cohen’s ! and its
bias-corrected version Hedge’s ! . The only difference in the construction procedure is that
we apply formula (2) to obtain the observed d value for the confidence interval around
Cohen’s ! and formula (3) in building the confidence interval around Hedge’s ! .
3.1.2.2 CI width and coverage probability
To evaluate the interval estimations, CI width is well recognised as a good index of precision
because it reflects a number of aspects of the precision of a study, including the amount of
variability in the population, the sample size and thus sampling error, and the amount of error
CIU = Φ( z0 +z0 + z1−α/2
1 − α( z0 + z(1−α/2)) )CIL CIU
CIL CIU
ds
gs
ds gs
! of !16 44
d*1
d*2
d*3
d*B
Sample A1
Sample A(n1)
Sample B(n2)
Sample B1
Sample A2
Sample A3
Sample B2
Sample B3
Sample AB Sample BB
d
Resampling for B times (f.i., B = 1000)
Step 2
Compare d to d* and calculate z0;Use Jackknife resampling to get α
Calculate CIL and CIU
use formulas (19)-(22)
Step 3
{d*1 , d*2 , d*3 , d*4 , d*5 , . . . , d*B− 3, d*B − 2, d*B − 1, d*B }CIL CIUquantile:
BS BCa CI: [d*3 , d*B− 2]
Step 1: Obtain a bootstrap sample of d*
Figure 3.1: Computation procedure of BCa CI
in the dependent variable (Steiger and Fouladi, 1997; Cummings and Finch, 2001). We define
CI width w as:
! (24)
We didn’t take the average as the value of width because the confidence interval is not a
symmetric interval around ! under the noncentral t-distribution.
As an indicator of accuracy, Coverage probability is taken as the percentage of confidence
intervals whose bounds correctly bracket the population value (% of coverage). The coverage
percentage of an idea confidence interval is expected to be the specified nominal level which
is 5% in our study.
3.1.3 Type I error and power for the statistical tests
In the simulation study for statistical tests, we need to calculate the type I error rate and the
empirical power for specific test. According to the definition (previously stated in 2.4), we
obtained the type I error rate as the proportion of the count of rejection times (p-value < 0.05
assuming the significance level to be 0.05) in the total simulations when the null hypothesis is
true. Similarly, we calculated power as the proportion of the count of rejecting times in the
total simulation times when the null hypothesis is false.
Additionally, we also calculated two sets of required sample sizes to achieve the same power
of 80% in Student’s t-test or Welch’s t-test. The first calculation is conducted by controlling
the effect sizes, more precisely, we are comparing the least number of observations that these
two t-tests require to gain a power of 80%. For each effect size setting, we found the
associated mean differences (calculated from the effect size) and the required sample size of
the control group to detect the specific mean differences. The second situation is to compare
the required sample sizes to gain a power of 80% to detect the same difference in two group
means. The R functions to calculate the minimum sample size is available in Syntax 2 in
appendix.
3.1.4 Summary
The methodologies to compare effect size measures, confidence interval approaches and
statistical tests under different assumptions are summarised in Table 3.1. In summary, the
analysis consists of three aspects: a) the bias, consistency and variance of the point estimates
of effect size measures; b) the coverage probability and interval widths of interval
w = CIU − CIL
δ
! of !17 44
estimations; c) type I error rates, power and required sample sizes to achieve the expected
power level of the hypothesis test.
! of !18 44
Table 3.1: Summary of evaluation criterion and methods
Effect size estimates Evaluation Criterion Key statistic
Point estimatesBias and consistency
Variance Var(d)
Confidence intervalPrecision
Accuracy % of coverage of the true parameter
NHSTType I error
Power
under H0#p < 0.05
Nsi m
!rbias =E (d ) − δ
δ
!w = CIU − CIL
under Ha#p < 0.05
Nsi m
3.2 Methods of the Monte Carlo simulations
With specific research interests in the evaluation process, we conducted series of Monte Carlo
simulations in three aspects.
3.2.1 Simulation design for the study of point estimators
We created 243 scenarios with varying sample sizes, population standard deviations and the
true mean differences (Table 3.2). For both normal and non-normal distributions, we chose
increasing sample sizes from very limited observations (e.g., only 10 observations in each
group) to large numbers (e.g., at least 50 observations in each group) in combination with
increasing ratios between sample sizes. Without losing generality, we set the second group as
the control group with constant configurations of the mean ( ! ) and the standard
deviation ( ! ). We give less configurations for each non-normal distribution as the
resulting number of total distributions are still comparable to the normal distributions. For the
parameters of non-normal distributions that are not defined by mean and standard deviation,
we gave configurations that show apparent non-normality and the mean and standard
deviation are calculated from the non-normal parameters such as scale, shape or skewness.
Three non-normal distributions were chosen based on a consideration of their value range and
application frequency in psychological studies. They are Skew-normal (Azzalini, 1985), Sinh-
arcsinh (Jones and Pewsey, 2009) and Gamma distributions (see distribution illustrations in
Figure 1 in appendix). Each scenario is based on 100,000 simulations of datasets in the
program R.
μ2 = 0
σ2 = 1
! of !19 44
Table 3.2: Simulation design for the study of point estimators
Control group -Experimental group Scenarios
Population parameters Sample parameters
Total sample size Sample size ratio (n1/n2)
Normal - Normal 162 0.1; 0.5; 0.8; 1.5; 2; 5 0.5; 1; 1.5
20; 25; 30; 40;
50; 60;
100;
125; 150
1; 1.5; 2
Normal - Skew-normal 18 -2; -1 1; 2
Skew normal - Skew normal 9 -1 2
Normal - SAS normal 18 -0.7; 1.6 1.1; 1.6
SAS normal - SAS normal 9 2.3 1.4
Normal - Gamma 18 -2; -1 0.7; 1.4
Gamma - Gamma 9 1 2
!μ1 − μ2SD ratio ( ! /! )σ1 σ2
For the effect size measures which are expressed in medians (e.g., ! ) or trimmed
variances ( ! ), it is very complicated to obtain the true parameters of the population since the
simulated distributions are not defined by medians (if the data doesn’t follow a normal
distribution) or any trimmed values. As a solution, we generated a large random sample as the
population pool which contains 100,000 observations, then we randomly drew samples from
this parenting pool with restricted size for later analysis. As we have the entire population
dataset, it is very easy to get the values of the medians, the trimmed variances and the
winsorized standard deviations.
3.2.2 Simulation design for the study of confidence intervals
For the part of confidence intervals, we mainly focus on the comparison of four confidence
intervals: NC procedures with Cohen’s ! in Student’s t-test, NC procedures with Shieh’s d in
Welch’s t-test, BS BCa method with Cohen’s ! and Hedge’s ! . We don’t apply the
bootstrapping procedures to all the other indices, because we are more interested to detect the
difference between the NC and BS BCa approaches than to explore the minor differences
between different estimations under one procedure. On the other hand, economic research
design is always preferred in practice since the computation of bootstrap procedures under
large number of simulation replications requires huge amount of time and resources. As a
result, we selected a few settings that represent the sensitive situations when the assumptions
are met or violated (see an overview of the configurations in Table 3.3).
dMAD
dR
ds
ds gs
! of !20 44
Table 3.3: Configurations in the simulation study of confidence intervals
Effect sizeSample size Standard deviation
SD1 SD2
0; 0.2; 0.5; 1;
2
10 10 1 1
10 10 2 1
20 10 1 1
20 10 2 1
75 50 1 1
75 50 2 1
!n1 !n2
With the given sample sizes and parameter configurations, estimates of the true coverage
probability are computed through Monte Carlo simulation of 100,000 independent datasets.
For each replicate, four confidence limits associated with two-sided upper and lower 100(1- ! )
% confidence intervals are computed together with their coverage probability, interval widths,
standard deviation of the interval widths, the mean and median of the interval widths and
interval limits and the empirical powers.
3.2.3 Simulation design for the study of hypothesis tests
Considering computing the type I error rate and power required large number of replications,
the type I error rates and empirical powers are calculated from 1,000,000 simulations under
each of the 243 scenarios (see configurations in Table 3.2).
In addition, we illustrated the different p-value distributions of Student’s t-test, Welch’s t-test
and the Mann-Whitney U test under typical situations where problems or defaults easily
occur. The configurations of the illustration cover cases representing: very small samples,
balanced design with unequal variances; very small samples, unbalanced design, but with
equal variances; big samples, unbalanced design, with unequal variances.
In the part of power analysis, we calculated two types of the minimum sample size to obtain
the power of 80% by controlling either the effect size or the group means. As it would not be
a practical concern with larger effect size or mean differences, we only chose the values of the
effect size or the mean difference that are less than 1.
α
! of !21 44
4 Results
4.1 Bias, consistency and variance of the point estimators
In order to investigate how Cohen’s ! and other effect size indices react across the
assumptions of normality and homogeneity of variances, Monte Carlo simulations of 243
scenarios were conducted. For each of the 243 scenarios (see configurations in Table 3.2), we
examined the bias rates and variances of the proposed effect size estimators related to specific
condition (see effect size measures in Table 1.1). As a result, the study covers four different
assumptions: Assumption 1 - both normality and homogeneity of variances are met (see
detailed results in Table 4.1 in appendix); Assumption 2 - only normality is met (Table 4.2 in
appendix); Assumption 3 - only homogeneity is met (Table 4.3 in appendix); Assumption 4 -
non-normality and heterogeneity of variances (Table 4.4 in appendix). As previously
explained (see 3.1.1), we are mainly interested to observe the bias rate and variance of the
estimators as well as the trend of bias rate with increasing sample sizes for the comparisons
among various effect size measures.
4.1.1 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s under Assumption 1
We firstly compare the performance of effect size estimators under Assumption 1 - normality
and homogeneity of variances, in terms of bias rate and variance. Figure 4.1 shows estimator
bias tends to decrease and precision is also improved with increasing sample sizes. For the
samples with more than 50 observations, effect size estimators except Glass’s ! are
asymptotically unbiased, with bias rate less than 1%.
For the groups with same number of observations, effect size estimators tend to be more
biased and variable when the difference between two group means enlarges, with exception of
the case ! and ! (Figure 4.1).
Among effect size indicators, Glass’s ! shows least precision and highest bias rates (Figure
4.1). Cohen’s ! performs the lowest bias rates under all configurations but not with the best
precision. As the bias-corrected version, Hedge’s ! produces similar bias rates and slightly
smaller variances compared to Cohen’s ! . Shieh’s d, on the other hand, demonstrates
comparable bias rates to Cohen’s ! and Hedge’s ! , but significantly outperforms in precision
(Figure 4.1). Notably, bias rate also depends on the experimental designs. When we are
working with two balanced groups, Cohen’s ! and Shieh’s d constantly produce the same
ds
d
ds
n1 : n2 = 40 : 20 μ1 − μ2 = 0.1
ds
ds
gs
ds
ds gs
ds
! of !22 44
bias rates, which are much smaller than Glass’s ! . This is no surprise since the relationship
between Cohen’s ! and Shieh’s d assuming equal sample sizes and standard deviations can be
deduced as: ! and ! . On the
other hand, for two unbalanced groups, Cohen’s ! is noticeably less biased than Shieh’s d but
this difference decreases with increasing sample sizes and does not appear to be associated
with effect size magnitudes (Figure 4.1). Nevertheless, Shieh’s d consistently shows smaller
variance in the sample estimates than Cohen’s ! . Therefore, Shieh’s d is preferred due to its
comparable bias rates but significantly smaller variances compared to Cohen’s ! .
4.1.2 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s d under Assumption 2
We next investigate the performance of these estimators under the assumption of normality
but heterogeneity of variance. Similar to assumption 1, bias rate and variance of all estimators
tend to decrease with increasing sample size (Figure 4.2). Cohen’s ! behaves always
similarly as Hedge’s ! in both bias rate and precision, but clearly less biased compared to
Glass’s ! and less precise than Shieh’s d (the relationship between the variances of Cohen’s
! and Shieh’s d can be derived as: ! ) (Figure 4.2).
Overall, Shieh’s d is the best estimator due to its comparable bias rate but clearly lower
variance (Figure 4.2).
ds
ds
Cohen's ds = 2 × Shieh's d Var(Cohen's ds) = 4 × Var(Shieh's d )
ds
ds
ds
ds
gs
ds
ds Var(Cohen's
ds ) = 4 × Var(Shieh's
d )
! of !23 44
0.00
0.02
0.0
0.1
0.2
0.3
0.4
0.5
0.04
0.06
0.08
0.10
Var
ianc
e(S
DR
=1)
0.1µ1-µ2
n1:n2
0.5 1.5
10:10
0.1 0.5 1.5
20:10
0.1 0.5 1.5
20:20
0.1 0.5 1.5
40:20
0.1 0.5 1.5
50:50
0.1 0.5 1.5
100:50
Cohen’s ds
Hedge’s gs
Glass’s ds
Shieh’s d
Figure 4.1: Assumption 1 - Normality and homogeneity of variances
Bia
s ra
te(S
DR
=1)
SDR: Standard Deviation Ratio
Remarkably, simulation results reveal that sample size allocation and standard deviation ratio
may jointly influence the behaviour of Shieh’s d (Figure 4.2, Table 4.2 in appendix). Given
unbalanced designs, if the standard deviation is smaller in the group with more observations
(e.g. ! , ! ), Cohen’s ! / Hedge’s ! is
less biased than Shieh’s d; otherwise, Shieh’s d shows the lowest bias rate which is similar to
Cohen’s ! / Hedge’s ! (Figure 4.2).
4.1.3 Cohen’s ds, Hedge’s gs, Shieh’s d, dMAD, dR and PSindep under Assumptions 3 and 4
When the assumption of normality is violated, both bias rates and variances of all effect size
estimators again decrease with increasing sample sizes (Figures 4.3 - 4.4). This is consistent
under both homoscedastic and heteroscedastic frameworks. The non-parametric estimator,
! shows the lowest bias rates with the smallest variances among all the proposed effect
size indices (Figures 4.3 - 4.4), while among the parametric effect size indices, Shieh’s d is
still the best indicator of effect size which produces comparable bias rates but smaller
variances under all configurations (Figures 4.3 - 4.4). These findings are consistent under
skew-normal, SAS normal and Gamma distributions. Notably, ! is significantly biased
in all cases, this may be because our configured non-normal distributions are not severely
skewed where this effect size measure shows the greatest advantages.
n1 : n2 = {(20 : 10); (40 : 20); (100 : 50)} σ1 : σ2 = 0.5 ds gs
ds gs
PSindep
dMAD
! of !24 44
0.1µ1-µ2
n1:n2
0.5 1.5
10:10
0.1 0.5 1.5
20:10
0.1 0.5 1.5
20:20
0.1 0.5 1.5
40:20
0.1 0.5 1.5
50:50
0.1 0.5 1.5
100:50
0.00
0.04
0.08
0.12
SDR = 0.5
0.1 0.5 1.5
10:10
0.1 0.5 1.5
20:10
0.1 0.5 1.5
20:20
0.1 0.5 1.5
40:20
0.1 0.5 1.5
50:50
0.1 0.5 1.5
100:50
SDR = 1.5
Bia
s ra
te
0.0
0.2
0.4
0.6
Var
ianc
e
Cohen’s ds
Hedge’s gs
Glass’s ds
Shieh’s d
Figure 4.2: Assumption 2 - Normality and heterogeneity of variances
SDR: Standard Deviation Ratio
! of !25 44
0.00
0.10
0.20
0.30
1.0
0.2
0.4
0.6
0.8
0.0
Bia
s ra
te (µ
1-µ2=-
1, S
DR=
1)V
aria
nce
(µ 1-µ
2=-1,
SD
R=1)
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50
Cohen’s ds
Hedge’s gs
PSindep
Shieh’s ddMADdR
Figure 4.3: Assumption 3 - Non-normality and homogeneity of variances
SDR: Standard Deviation Ratio
0.0
0.1
0.2
0.3
0.4
0.5
0.0
1.0
2.0
3.0
0.0
0.1
0.2
0.3
0.4
0.0
0.5
1.0
1.5
0.00
0.10
0.20
0.30
0.0
1.0
2.0
3.0
Bia
s ra
te
Skew-normal distribution
Sinh-arc-sinh normal distribution
Gamma distribution
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = 1.6; SDR = 1.6 µ1 - µ2 = -0.7; SDR = 1.1
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = 1.6; SDR = 1.6 µ1 - µ2 = -0.7; SDR = 1.1
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2
n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2
Varia
nce
Cohen’s dsHedge’s gs
PSindep
Shieh’s ddMADdR
Figure 4.4: Assumption 4 - Non-normality and heterogeneity of variances
SDR: Standard Deviation Ratio
Across different assumptions, it is also observed that the behaviour of Cohen’s ! in terms of
bias rates and variances varies with conditions (Figures 4.1 - 4.4). When the assumptions of
normality and homogeneity of variances are both met, Cohen’s ! / Hedge’s ! is consistently
less biased under both balanced and unbalanced experimental designs compared to the other
two effect size measures (Figure 4.1, Table 4.1 in appendix). However, it tends to be more
biased under Assumption 2 given the same configurations to Assumption 1; and commits very
high bias rates (e.g. 11.42% under Skew-normal distributions, 14.86% under Gamma
distributions) when the assumptions of normality and homogeneity are violated (Table 4.4 in
appendix). Although the bias rate of Cohen’s ! decreases with enlarging numbers of
observations overall, it still seems relative higher compared to the simulation results from the
same sample sizes under Assumption 1 (Figure 4.4, Table 4.4 in appendix). In addition, it
always suffers larger variances than Shieh’s d under all assumptions. Notably, we cannot
provide further discussion on exact within-estimator difference across conditions since the
parameter configurations are not unified between normal (Assumptions 1 and 2) and non-
normal distributions (Assumptions 3 and 4). This will be further discussed in the Discussion
session.
4.2 Accuracy and precision of confidence intervals
As highlighted in 2.3, reporting the associated confidence interval around the effect size point
estimator provides all the information that P values do, and more. Therefore, besides the
examination on different effect size measures, we also constructed four confidence intervals
following the parametric and non-parametric approaches and compared the interval estimates
in terms of Coverage probability which indicates the accuracy of estimates and CI width
which indicates the precision of estimates.
The simulation study makes comparisons among the following four confidence intervals
(construction methodologies are described in 3.1.2.1): the NC t-distribution CI around
Cohen’s ! assuming homogeneity of variances; the NC t-distribution CI around Shieh’s d
under the heteroscedastic framework; the BS BCa CI around Cohen’s ! ; and BS BCa CI
around Hedge’s ! . To avoid large amount of computing load which might be caused by
bootstrap procedures and also to prevent repetitions in the similar case of assumptions, we did
not repeat all the scenarios in the simulation study of the point estimators, but the
representative configurations to show the most potential problematical situations in reality.
ds
ds gs
ds
ds
ds
gs
! of !26 44
The parameter configurations under normal and skew-normal distributions are the same in
e a c h s c e n a r i o : w i t h v a r i a n c e s ! a n d s a m p l e s i z e s
! . These settings not only include both homoscedastic/
heteroscedastic and balanced/unbalanced designs, but also create direct and inverse parings
between variance and sample size structures. Overall, these considerations result in a total of
six different joined configurations for each of the parametric and non-parametric simulations.
Without loss of generality, the second group mean is fixed as ! , and the first group mean
! is chosen such that the standardised mean difference Cohen’s ! = 0, 0.2, 0.5, 1, 2
representing small, medium and large effect sizes for each combined structure of ! and
! . Detailed results of Monte Carlo simulations under each of the 60 scenarios are listed
in Tables 4.5 - 4.16 in appendix. We illustrate the two most important statistics of confidence
intervals (i.e., coverage probability and interval width) in Figures 4.5 and 4.6.
Overall, NC method outperforms BS BCa method either when the assumptions of normality
and homogeneity of variances are both met or under the true null hypothesis. When the null
hypothesis is true, NC intervals around Cohen’s ! always approach the closest to 95%
coverage percentage under Assumption 1; otherwise, NC intervals around Shieh’s d achieve
mostly the favourite accuracy, with two exceptions indicating unbalanced designs with equal
variances under small samples (Tables 4.13 and 4.15 in appendix). When there is real
difference between two groups (the alternative hypothesis is true), NC intervals around
Shieh’s d gives most accurate estimations for small and medium effect sizes ( ! = 0.2 and 0.5)
under smaller samples and for all ranges of effect sizes when the sample size gets larger
(Table 4.5, 4.7 and 4.9 in appendix). When the assumption of homogeneity of variances is
violated, NC intervals with Shieh’s d shows distinct advantages in terms of coverage
probability compared to all the other intervals across all experimental designs and sample
sizes (Figure 4.5, Tables 4.6, 4.8, 4.10 in appendix). Thus, NC confidence intervals around
Shieh’s d performs the best in terms of coverage probability either when the parametric
assumptions are met or the null hypothesis is true.
When the parametric assumptions are violated, BC BCa intervals around Cohen’s ! provide
coverage percentiles closest to 95% when the number of observations are large (Table 4.15
and 4.16 in appendix). Noticeably, there is under-coverage problem with BS BCa intervals
around Hedge’s ! when two groups contain moderate sizes of observations, but BS approach
tends to approximate satisfying coverage percentages when the sample size gets larger (Figure
(σ21 , σ2
2 ) = (1,1) and (2,1)
(n1, n2) = (10,10), (20,10) and (75,50)
μ2 = 0
μ1 ds
(σ21 , σ2
2 )
(n1, n2)
ds
δ
ds
gs
! of !27 44
4.5, Tables 4.5 - 4.16 in appendix). This also suggests the importance of an adequate number
of independent pieces of information in the framework of resampling. However, if the
observations are relatively limited, NC intervals with Shieh’s d still gives overall the most
satisfying coverage percentages (Figure 4.5, Table 4.11, 4.12 in appendix). Therefore, when
the assumption of normality is violated, BS BCa intervals around Cohen’s ! NC intervals
offer the most accurate interval estimations with adequate observations (e.g., at least 50
observations in each group); and the parametric NC intervals around Shieh’s d achieve
satisfying coverage probabilities with small sample sizes (e.g., no more than 20 observations
in each group).
In terms of precision of the estimations, overall, all intervals tend to give more precise
estimations with increasing sample sizes (Figure 4.6). Moreover, the mean widths of all the
proposed intervals broaden with enlarging effect sizes. In comparison, NC intervals around
Shieh’s d achieves the narrowest widths across all assumptions, while NC intervals with
ds
! of !28 44
1
0.85
0.95
Effect size 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2
n1:n2 10:10 20:10 75:50 10:10 20:10 75:50
σ1 = σ2 = 1 σ1 = 2, σ2 = 1
1
0.85
0.95
Confidence Intervals under assumptions 1 and 2
Figure 4.5 NC - Cohen’s ds
BS BCa - Cohen’s ds NC - Shieh’s d
BS BCa - Hedge’s gs
Confidence Intervals under assumptions 3 and 4
Cov
erag
e (%
)C
over
age
(%)
Cohen’s ! are wider than twice the length of the NC intervals with Shieh’s d. In addition, the
BS intervals are also more precise than the NC interval of Cohen’s ! .
To sum up, the coverage probably and interval width of certain CI vary across different
assumptions, experimental designs and sample sizes. Accordingly, the choice of interval
approaches should take into account of all these issues and effect size magnitude if available.
Based on our observations in the simulation study, NC intervals around Shieh’s d maintains
stable coverage percentages that are very close to the expected nominal level (95% in this
study) across all conditions under normality assumption. When the parametric assumption is
violated, this interval approach still has advantages for estimations under balanced designs,
but can potentially yield misleading confidence interval limits such that the empirical
coverage is greater or less than the nominal coverage specified under unbalanced designs. On
the other hand, BS BCa intervals are preferred under non-parametric conditions especially
when the number of observations is adequate.
ds
ds
! of !29 44
0.0
1.0
2.0
3.0
0.0
1.0
2.0
3.0
Ave
rage
wid
thA
vera
ge w
idth
Figure 4.6 NC - Cohen’s ds
BS BCa - Cohen’s ds NC - Shieh’s d
BS BCa - Hedge’s gs
Effect size 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2
n1:n2 10:10 20:10 75:50 10:10 20:10 75:50
σ1 = σ2 = 1 σ1 = 2, σ2 = 1
Confidence Intervals under assumptions 1 and 2
Confidence Intervals under assumptions 3 and 4
4.3 Statistical tests to compare two groups
As highlighted in 2.4, effect sizes are always associated with inferential statistical tests and
these tests react differently under different assumptions in terms of the rates of committing
type I and type II errors. In the context of comparisons of two independent group means,
Student’s t-test and Welch’s t-test are most commonly used significant tests according to the
assumptions. The former is applied if we can assume homogeneity of variances in two
groups; the latter offers more flexibility in case we cannot assume equal variances.
Accordingly, all the parametric effect size measures can be connected with these two t-tests
based on the consistence of assumptions, e.g., Cohen’s ! with Student’s t-test, Shieh’s d with
Welch’s t-test etc. When the number of observations is extremely few and the data does not
follow a normal distribution, Mann-Whitney U test is widely used as an alternative to t-tests
although it compares two group distributions rather than two group means. We consider that
these three tests cover the range of effect size measures of our interests. As a complementary
to the study of effect size measures and CI approaches, we explore the difference between
these tests in terms of type I error rates, empirical powers and required sample sizes.
4.3.1 Type I error rates
Monte Carlo simulation results of type I error rates and empirical powers under each of 243
scenarios (see configurations in Table 3.2) are based on 1,000,000 replications in the program
R (see detailed simulation results in Table 4.17 - 4.20 in appendix). Table 4.1 summarises the
simulation results of type I error rates across four assumptions. When the assumptions are
met, Student’s t-test is obviously better than Welch’s t-test, because it produces stable type I
error rates (SD( ! )=0.2%) which exactly yield the expected nominal level ( ! =5%) on average.
Welch’s t-test generates more various type I error rates (SD( ! )=0.7%) slightly further from
5% (Mean( ! )=0.0494) under balanced designs. Furthermore, Welch’s t-test obviously
outperforms Student’s t-test when the assumption of homogeneity of variances is violated
(Assumption 2). It produces the lowest average type I error rate and with smaller variance on
average, which indicates a stable level of the observed type I error rates. The simulation result
indicates that the Welch t-test appears to have a stable type I error rate around the expected
alpha level even under non-normal conditions. On the other hand, we do expect a better
performance of Mann-Whitney U test under non-normal conditions but the result doesn’t
show evidence. This is mainly due to the difference of the null hypothesis between t-tests and
ds
α α
α
α
! of !30 44
Mann-Whitney U test. Mann-Whitney U test compares two distributions rather than two
group means, but in the simulation studies, we generated the true null hypothesis group by
random samples with equal means. Moreover, the undesirable higher type I error rates
produced by Mann-Whitney U test again confirms its immunity to heteroscedasticity.
Note. Mean(α) is the averaged value of the observed type I error rates in simulation studies, SD(α) is the
standard deviation of the observed type I error rates. Cell colours indicate how far the observed α deviates from
the significance level ( ! =0.05 in this study), i.e., a deeper colour indicates a shorter distance between the observed type I error rate and α (0.05). The cutting values for defining the colours are 0.05, 0.001, 0.005, 0.01.
These findings are confirmed by simulation results for the p-values under the true null
hypothesis based on 1,000,000 independent t-tests. In simulation of the p-values of three tests
mentioned above, we configured small samples in both homogenetic and heteroscedastic
frameworks and big samples with unequal variances, also in combination with balanced and
unbalanced designs (Table 4.2).
When the assumptions are met (Assumption 1), i.e. two normal distributed groups with equal
variances, the type I error rate (probability to reject null hypothesis when null hypothesis is
true) is very similar between for Students’s t-test and Welch’s t-test even under unbalanced
α
! of !31 44
Table 4.1: Simulation results of type I error of Student’s t-test, Welch’s t-test and Mann-Whitney U test
Assumption DistributionResearch
design
Student t-test Welch t-test Mann-Whitney U test
1
Normal
Balanced design 0.0500 0.0002 0.0494 0.0007 - -
Unbalanced design 0.0500 0.0002 0.0499 0.0003 - -
2Balanced
design 0.0519 0.0014 0.0499 0.0004 - -
Unbalanced design 0.0646 0.0354 0.0500 0.0006 - -
3 Skew-normal
Balanced design 0.0508 0.0009 0.0501 0.0001 0.0555 0.0086
Unbalanced design 0.0508 0.0007 0.0499 0.0002 0.0590 0.0074
4
Skew-normal
Balanced design 0.0570 0.0041 0.0546 0.0028 0.1050 0.0381
Unbalanced design 0.0264 0.0067 0.0510 0.0012 0.0946 0.0425
SAS
Balanced design 0.0498 0.0004 0.0490 0.0011 0.0472 0.0031
Unbalanced design 0.0498 0.0002 0.0500 0.0005 0.0487 0.0008
Gamma
Balanced design 0.0592 0.0124 0.0578 0.0117 0.1530 0.1800
Unbalanced design 0.0505 0.0191 0.0517 0.0030 0.1760 0.2110
SD(! )αMean( ! )α SD(! )αSD(! )α Mean( ! )αMean( ! )α
designs (Figure 4.7(a)-1). Student’s t-test performed slightly more stable and lower type I
error rate than Welch’s t-test when the sample size is extremely small (only 5 observations in
one group), but this difference quickly disappeared when the observations increased to at least
10 in each group. Thus, when the assumptions of normality and homogeneity of variances are
met, Student’s t-test is slightly better than Welch’s t-test in terms of type I error rates in case
of extremely small sample sizes (n < 15), otherwise, the two t-tests appear to perform similar
in type I error rates.
When we can only assume normality (Assumption 2), different p-value distributions show up
between the two t-tests even under balanced designs (Figure 4.7(a)-2) and Welch’s t-test
appears to achieve the more stable and lower type I error rates (Figures 4.7(a)-3 and 4.7(a)-4).
Moreover, the distinction appears to be more obvious when the difference of variances in two
groups increases, even with larger number of observations. Nevertheless, the Student’s t-test
offers either too conservative or less cautious liberal results due to the heteroscedasticity
while the observed p-values for Welch’s t-test maintain constantly stable distributions (Figure
4.7(a)-4). Therefore, Welch’s t-test is preferred to Student’s t-test under Assumption 2.
! of !32 44
Table 4.2 Configurations in the simulation study of p-values
Assumption Description Distribution
1 unbalanced small samples with equal variances 15:10 1:1
Normal2
balanced small samples with unequal variances 10:10 0.5:2
unbalanced big samples with unequal variances75:50 0.5:1
100:50 3:1
3 unbalanced small samples with equal variances 15:10 1:1
Skew-normal4
balanced small samples with unequal variances 10:10 2:1
unbalanced big samples with unequal variances75:50 2:1
50:100 2:1
3 unbalanced small samples with equal variances 15:10 2:2
Gamma4
balanced small samples with unequal variances 10:10 1:0.5
unbalanced big samples with unequal variances75:50 2:0.5
50:100 1:0.5
!σ1 : σ2!n1 : n2
The findings are consistent under Skew-normal distributions (Figure 4.7(b)). Both Welch’s t-
test and Student’s t-test maintain stable type I error rates either when the assumption of
homogeneity of variances is met or under balanced designs, even with very small sample
sizes. On the other hand, when we encounter the combination of heteroscedasticity and
unbalanced observations in each group, even with adequate observations (e.g., n ! 50 for ≥
! of !33 44
Student's t−test
n1=15n2=10SDR=1
Freq
uenc
y
020000
60000
Welch's t−test
0
20000
40000
60000
n1=10 n2=10 SDR=0.25
n1=75n2=50 SDR=0.5
(a) N
orm
al d
istr
ibut
ions
n1=100 n2=50 SDR=3
Observed p−value
0 0.2 0.4 0.6 0.8 1.0
40000
80000
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
(b) S
kew
-nor
mal
dis
trib
utio
ns
n1=15n2=10 SDR=1
n1=10 n2=10 SDR=2
n1=75 n2=50 SDR=2
n1=50 n2=100 SDR=2
020000
6000040000
80000100000
0
20000
40000
60000
050000
150000100000
200000250000
Student's t−test
Welch's t−test
Mann−Whitney U test
n1=15 n2=10 SDR=1
n1=10 n2=10 SDR=2
n1=75 n2=50 SDR=4
n1=50 n2=100 SDR=2
050000
150000
100000
200000
050000
150000
100000
200000
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
0200000
600000400000
8000001000000
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
(c) G
amm
a di
strib
utio
ns
Student's t−test
Welch's t−test
Mann−Whitney U test
Figure 4.7: The distribution of observed p-values
(1) (2) (3) (4)
SDR: Standard Deviation Ratio
each group), the difference between the two t-tests still appears remarkably (Figures 4.7(b)-3
and (b)-4). Notice here the Mann-Whitney U test does not have better result as expected, this
is mostly because the assumptions of this test are not realised in the configurations of the
simulated data. Nevertheless, we did not observe the same trend under gamma distributions,
which indicates the application of Welch’s t-test as nonparametric test is not universally valid
especially when the sample distribution deviates severely from the normal shape or the
distinction between two group standard deviations is extreme (Figure 4.7(c)).
To sum up, Welch’s t-test is preferred than Student’s t-test under the heteroscedasticity
framework and unbalanced designs; Welch’s t-test can be even extensively used in
nonparametric situations as long as the number of observations are adequate and they don’t
origin from two populations with extremely distinct variances.
4.3.2 Power analysis
4.3.2.1 Empirical powers under different assumptions
In general, the power of hypothesis tests improves either with the increasing sample sizes or
with the increasing effect sizes (Tables 4.3 - 4.4, see detailed results in Tables 4.17 - 4.20 in
appendix). This is reasonable, if the effect size is small, then we need more observations to
find the significance; if the effect size is large, then we are able to detect a significant effect
without requiring large number of observations (e.g. with total observations less than 50).
When the assumptions of normality and homogeneity of variances are met (Assumption 1),
Student’s t-test is more powerful in detecting the mean differences in two groups, but the
difference between the two t-tests is extremely small (about 0.1% on average) (Table 4.3).
When the assumption of equal variances in two groups is violated, Welch’s t-test loses more
power and the difference in the power rates between the two t-tests increases to 0.5% on
average (Table 4.3). When we are working with two Skew-normal distributed groups with
equal variances (Assumption 3), Mann-Whitney U test shows more power than t-tests
especially under small samples. Besides, Welch’s t-test appears more powerful than Student’s
t-test to detect the difference between two group means (Table 4.4). When the assumptions of
normality and homogeneity of variances are both violated (Assumption 4), Welch’s t-test
possesses constantly higher power than Student’s t-test and is even more powerful than Mann-
Whitney U test, which confirms the potential robustness of Welch’s t-test in non-parametric
situations. Note, our simulation study doesn’t show distinct advantages of Mann-Whitney U
! of !34 44
test under non-parametric conditions especially when the assumptions of normality and
homogeneity of variances are both violated. This reminds us that the application of Mann-
Whitney U test still have restrictions and it may not be an optimal choice in Assumption 4.
Note. Observed effect sizes are calculated from Hedge’s ! (formula (3)). Selected effect size values ranging
from 0 to 2 are categorised into three intervals following Cohen’s criterion (described in 2.1). Mean(power) is the averaged observed power rates in each category for the specific combination between effect size and sample
size. Difference is calculated as the reduction from the power of Student’s t-test to Welch’s t-test. A positive value of “Difference” suggests a larger power for Student’s t-test, while a negative value may indicate the occasion where Welch’s t-test is more powerful.
gs
! of !35 44
Table 4.3: Summary of simulation results of powers of Student’s t-test, Welch t-test and Mann-Whitney U
Assumption Total sample size
Observed effect size
Mean(power) Difference between two
t-testsStudent’s t-test Welch’s t-test
1
<50
(0, 0.2) 0.0576 0.0569 0.0007
[0.2, 0.8) 0.3813 0.3753 0.0060
>0.8 0.9808 0.9793 0.0015
[50, 100)
<50 0.0641 0.0640 0.0002
[50, 100) 0.6060 0.6010 0.0050
>100 0.9998 0.9998 0.0000
>100
<50 0.0839 0.0837 0.0002
[50, 100) 0.8757 0.8750 0.0007
>100 1.0000 1.0000 0.0000
2
<50
(0, 0.2) 0.0679 0.0576 0.0104
[0.2, 0.8) 0.2948 0.2878 0.0071
>0.8 0.9270 0.9182 0.0087
[50, 100)
<50 0.0821 0.0643 0.0178
[50, 100) 0.4805 0.4866 -0.0061
>100 0.9900 0.9834 0.0065
>100
<50 0.0994 0.0853 0.0141
[50, 100) 0.7910 0.8038 -0.0128
>100 0.9999 0.9999 0.0000
Note. Observed effect sizes are calculated from Hedge’s ! (formula (3)). Selected effect size values ranging
from 0 to 2 are categorised into three intervals following Cohen’s criterion (described in 2.1). Mean(power) is the averaged observed power rates in each category for the specific combination between effect size and sample
size. Difference is calculated as the reduction from the power of Student’s t-test to Welch’s t-test. A positive value of “Difference” suggests a larger power for Student’s t-test, while a negative value may indicate the occasion where Welch’s t-test is more powerful.
gs
! of !36 44
Table 4.4: Summary of simulation results of powers of Student’s t-test, Welch t-test and Mann-Whiteney U test under skewed-normal distributions
Assumption Distribution Total sample size
Observed effect size
Mean(power) Difference between t-
testsStudent’s t-test
Welch’s t-test
Mann-Whitney U
3
Skew-normal
<50 >0.8 0.6889 0.6821 0.6912 0.0067
[50, 100) >0.8 0.9261 0.9273 0.9402 -0.0012
>100 >0.8 0.9990 0.9991 0.9995 0.0000
4
<50[50, 100) 0.3583 0.4038 0.4476 -0.0455
>100 0.8244 0.8583 0.8484 -0.0338
[50, 100)[50, 100) 0.5431 0.6685 0.7384 -0.1255
>100 0.9837 0.9934 0.9930 -0.0097
>100[50, 100) 0.9020 0.9273 0.9734 -0.0253
>100 1.0000 1.0000 1.0000 0.0000
SAS
<50[50, 100) 0.3423 0.3659 0.3054 -0.0237
>100 0.8559 0.8753 0.8090 -0.0194
[50, 100)[50, 100) 0.6001 0.6426 0.5270 -0.0425
>100 0.9949 0.9968 0.9827 -0.0019
>100[50, 100) 0.9380 0.9423 0.8776 -0.0044
>100 1.0000 1.0000 1.0000 0.0000
Gamma
<50[50, 100) 0.5044 0.5700 0.5808 -0.0656
>100 0.8175 0.8257 0.8215 -0.0082
[50, 100)[50, 100) 0.7736 0.8264 0.9124 -0.0528
>100 0.9500 0.9684 0.9383 -0.0183
>100[50, 100) 0.9897 0.9944 0.9997 -0.0047
>100 0.9948 0.9948 0.9980 0.0000
4.3.2.2 Required sample sizes to achieve a power of 80%
Two sets of required sample sizes to achieve a power of 80% for Student’s t-test and Welch’s
t-test are calculated by controlling either the effect sizes (Table 4.21 in appendix) or the real
mean differences in population (Table 4.22 in appendix).
Given the same effect size values, Welch’s t-test requires less observations than Student’s t-
test (Figure 4.8) and this finding is consistent across all configurations of parameters of two
groups (Table 4.21 in appendix). Independent from homogeneity of variances in two groups
and the experimental design (balanced/unbalanced), Welch’s t-test always outperforms
Student’s t-test in terms of the least observations in the control group to gain the same power.
This is consistent with previous findings in Table 4.21 in appendix, the difference in two
group means always needs to be larger in Welch t-test than Student’s t-test if their effect size
estimators have the same values (Cohen’s ! = Shieh’s d), therefore less observations are
required to detect larger mean differences to gain the same power.
To gain a power above 80% to detect the same mean difference, Student’s t-test and Welch’s
t-test require similar sample sizes either under balanced design or assuming equal variances in
two groups (Table 4.22 in appendix). The two t-tests differ in the minimum sample sizes when
we are working with two unbalanced groups without assuming homogeneity of variances.
When there is less/more observations in the group with smaller/bigger standard deviation (e.g.
SDR < 1 and SSR < 1), Welch’s t-test requires less observations (Figure 4.8, Table 4.22 in
appendix); on the contrary, when larger variance is associated with the smaller group (e.g.
SDR < 1, SSR > 1), Student’s t-test is then preferred as it requires less observations to reach
the same power (Figure 4.8, Table 4.22 in appendix).
To sum up, power analysis depicts that Welch’s t-test is slightly less powerful than Student’s t-
test under normality assumption, but it is definitely more powerful than Student’s t-test under
non-parametric conditions especially both assumptions of normality and homogeneity of
variances are violated. The simulation results confirm that Welch’s t-test achieves more
frequently larger power rates than Mann-Whitney U test under both the homoscedasticity and
the heteroscedasticity framework for non-normal distributions.
ds
! of !37 44
! of !38 44
0.1 0.3 0.5 0.8 1Mean.diff
0.1 0.3 0.5 0.8 1 0.1 0.3 0.5 0.8 1
Student’s t-test Welch’s t-test
Figure 4.8: Required sample size to gain a power of 80%
Req
uire
d sa
mpl
e si
ze
Mean.diffEffect size
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500SDR=0.5SSR=0.5
SDR=0.5SSR=0.5
SDR=0.5SSR=2
SDR: Standard Deviation RatioSSR: Sample Size Ratio
5 Discussion
In applied research reporting effect sizes accompanied by the corresponding confidence
intervals has steadily shown their importance in communicating the practical significance of
results (Lakens, 2013). Previous research has highlighted that the choice of effect size
measures and approaches to build the associated confidence intervals is highly dependent on
specific conditions (Algina et al., 2005; Cumming & Finch, 2001; Grissom & Kim, 2001;
Lakens, 2013; Shieh, 2013), but unfortunately, limited studies properly report effect size in
the last several decades (Kotrlik et al., 2011). Therefore, it is of great relevance to compare
different effect size indicators and confidence interval approaches under diverse conditions, in
order to provide a guideline to researchers to choose proper effect size measures and their
confidence intervals.
In the context of group comparisons, Cohen’s ! that relies on the often untenable
assumptions of normality and homogeneity of variances is the dominant effect size measure
used by researchers. However, the consequence of arbitrary use of Cohen’s ! when the
assumptions are violated and the superiority of alternatives to Cohen’s ! under specific
conditions remain vague. This study conducted a comprehensive comparative study through
Monte Carlo simulations on Cohen’s ! and other alternatives across assumptions regarding
the point estimators, confidence intervals and related statistical tests.
Firstly, we explored how Cohen’s ! and other effect size indices react in terms of bias,
consistence and variance across the different assumptions of normality and homogeneity of
variances. Based on a systematic review on proposed effect size indices under different
assumptions (summarised in Table 2.1), we conducted the Monte Carlo simulation study on
proposed effect size estimators including Cohen’s ! , Hedge’s ! , Glass’s ! , Shieh’s d under
normal conditions and Cohen’s ! , Hedge’s ! , Shieh’s d, ! , ! and ! under non-
normal distributions. Simulation results reveal that Shieh’s d is the best parametric effect size
measure which performs mostly comparable lower bias rates to Cohen’s ! and maintains
higher precision across almost all assumptions. The only exception is when the group with
less observations has larger variances under the assumption of normality, because there
Shieh’s d appears to be more biased than Cohen’s ! . On the other hand, the higher bias rates
of Cohen’s ! under the non-normality assumption alert that using classic methods for means
comparisons and assume all is well is evidently unsatisfactory.
ds
ds
ds
ds
ds
ds gs ds
ds gs dMAD dR PSindep
ds
ds
ds
! of !39 44
Notably, our analysis didn’t confirm the expected advantages of recommended effect size
point estimators such as ! , ! and ! when dealing with non-normally distributed
data. One possible explanation is that our configurations in simulations did not include
extremely skewed data where using medians or the techniques of trimming and winsorization
may offer great power to operate outliers. In this aspect, further explorations can be extended
by including more distinct distributions with extremely heavy tails and testing more different
values of the trimming percentages such as 10%, 15% and 30%. Furthermore, it is not
convincing that we can use ! as a default alternative to Cohen’s ! under all non-
parametric cases, because 1) it doesn’t reflect the difference in two group means but in two
distributions and 2) its robustness to non-normality is threatened by heteroscedasticity.
Concisely, it is worthy of attention that different methods are sensitive to different features of
the data and can provide different perspectives that have practical importance. Hence, more
modern techniques with practical advantages have to be developed.
The simulation results provide significant evidences to support the use of Shieh’s d as default
in effect size calculations when we cannot make any assumptions of the data. However,
Cumming (2013) contended two serious deficiencies in its standardizer. Firstly, in the case of
equal-variances and balanced design, Shieh’s d does not reduce to Cohen’s ! as most
commonly defined (actually, ! ). Therefore, it lacks the consistency
with the most familiar version of Cohen’s ! . Secondly, it does not give a readily interpretable
version of Cohen’s ! since the standardizer is dependent on the relative sample sizes, indexed
by ! (i.e., choosing different sample sizes for the experiment, the measurement unit of
! changes). Evidently, further studies need to be done to improve these limitations.
One of the major challenges in the part of simulation study is that it is not feasible to unify the
parameter configurations under normal and non-normal distributions, as the non-normal
distributions such as SAS-normal and Gamma are not defined by the mean and standard
deviation. As a consequence, the behaviour of Cohen’s ! under normality and non-normality
assumptions is not directly comparable. To extend this study, we believe that within-estimator
comparisons by controlling all parameters in different distributions will be a meaningful
move-on step.
After comparing performance of effect size point estimators, we detected the accuracy and
precision of four confidence intervals constructed through parametric and non-parametric
dMAD dR PSindep
PSindep ds
ds
Shieh's d =12
Cohen's ds
ds
ds
qi
Shieh's d
ds
! of !40 44
approaches. Under the assumption of normality, NC method around Shieh’s d is considered as
the optimal method because it always gives the most accurate interval estimations
disregarding experimental designs and sample sizes; while BS BCa interval around Cohen’s
! shows great advantages under non-parametric conditions especially when the observations
are adequate. In terms of precision, NC intervals around Shieh’s d are the narrowest intervals
among all.
Notably, the configurations of the mean difference in our simulations only represent right-
shifted cases. In the analysis, we should also take it into account that some observed
tendencies are related to this fact. For instance, the observed standard deviations for lower
bounds of confidence intervals are always larger than those for up bounds and the difference
increases with the magnitude of location shift (Table 4.5 - 4.16 in appendix), but we would
not assume it as a universal rule. We would expect to observe an opposite trend when the
location of the second distribution is left-shifted.
Lastly, we compared type I error rates and empirical powers (including minimum sample
sizes to obtain certain powers) between Student’s t-test and Welch’s t-test since effect size is
always associated with a hypothesis test. Overall, Welch’s t-test outperforms Student’s t-test
in type I error rates, with a cost of power but less required observations than Student’s t-test to
achieve the same power. Exceptions are consistent to the findings on Shieh’s d when there is
less observations in the group with larger standard deviation. Remarkably, the non-parametric
test (Mann-Whitney U test) performs higher type I error rates under non-normal conditions in
the simulation study possibly due to two reasons: 1) we generated the null hypothesis by
setting the mean difference equals to zero but it is not exactly the null hypothesis for Mann-
Whitney U test. 2) although Mann-Whitney U test allows data as free distributed, it still
assumes they should have the same shape, otherwise it will therefore lead to the rejection of
the assumption of equal distributions (Zimmerman, 2000). As extension to the simulation
study of hypothesis tests, we propose to include more significant levels (e.g., ! = 0.025, 0.01),
power rates (e.g., power = 85%, 90%, 95%) as well as diverse data types.
In summary, this study comprehensively analysed and compared effect size and confidence
interval estimates of Cohen’s ! and other alternatives across the different assumptions of
normality and homogeneity of variances in two independent groups. It contributes to the
effect size literatures by providing a guideline to choose the appropriate effect size estimator
and confidence interval approach to convey quantitative information in applied research.
ds
α
ds
! of !41 44
6 References
Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized
mean difference effect size: A robust parameter and confidence interval in the two
independent groups case. Psychological Methods, 10(3), 317-328. DOI:
10.1037/1082-989X.10.3.317
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian
Journal of Statistics, 12(2), 171-178. Retrieved from http://www.jstor.org/stable/4615982
Charness, G., Gneezy, U. & Kuhn, M. (2012). Experimental methods: Between-subject and
within-subject design. Journal of Economic Behavior and Organization, 81(1), 1-8. DOI:
10.1016/j.jebo.2011.08.009
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Academic Press;
Cambridge, Massachusetts. DOI: 10.1016/C2013-0-10517-X
Cumming, G. (2013). Cohen’s d needs to be readily interpretable: comment on Shieh.
Behavior Research Methods, 45(4), 968-971. DOI: 10.3758/s13428-013-0392-4
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of
confidence intervals that are based on central and noncentral distributions. Educational
and Psychological Measurement, 61(4), 532–574. DOI: 10.1177/0013164401614002
Cumming, G., & Jageman, R.C. (2017). Introduction to the New Statistics: Estimation, Open
Science, and Beyond. Routledge; New York.
Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s
t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92–101.
DOI: 10.5334/irsp.82
Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers.
Professional Psychology: Research & Practice, 40, 532-538. DOI: 10.1037/a0015808
Glass, G. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher,
5(10), 3-8. Retrieved from http://www.jstor.org/stable/1174772
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research : a broad practical approach.
Lawrence Erlbaum Associates, Mahwah, N.J.; London
Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate
conceptualization of effect size. Psychological Methods, 6(2), 135-146. DOI:
10.1037/1082-989X.6.2.135
! of !42 44
Harrison, D., & Brady, A. (2004). Sample size and power calculations using the noncentral t-
distribution. The State Journal, 4(2), 142-153. DOI: 10.1177/1536867X0400400205
Hedges, L. V., & Olkin, I. (1985). Statistical Methods for Meta-analysis. Academic Press;
Cambridge, Massachusetts. DOI: 10.1016/C2009-0-03396-0
Jones, M., & Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika, 96(4), 761-780.
Retrieved from http://www.jstor.org/stable/27798865
Kelley, K. (2005). The effects of nonnormal distributions on confidence intervals around the
standardized mean difference: Bootstrap and parametric confidence intervals. Educational
and Psychological Measurement, 65(1), 51–69. DOI: 10.1177/0013164404264850
Kelley, K., & Preacher, K. (2012). On effect size. Psychological Methods, 17(2), Jun 2012,
137-152. DOI: 10.1037/a0028086
Kotrlik, J.W., Williams, H.A. & Jabor, M.K. (2011). Reporting and interpreting effect size in
quantitative agricultural education research. Journal of Agricultural Education, 52(1),
132–142. DOI: 10.5032/jae.2011.01132
Kim, S., & Cohen, A. (1998). On the Behrens-Fisher problem: A review. Journal of
Educational and Behavioral Statistics, 23(4), 356-377. Retrieved from http://
www.jstor.org/stable/1165281
Kulinskaya, E. & Staudte, R. G. (2007). Confidence intervals for the standardized effect
arising in the comparison of two normal populations. Statistics in Medicine, 26(14),
2853-71. DOI: 10.1002/sim.2751
Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A
practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, Article ID 863. DOI:
10.3389/fpsyg.2013.00863
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: do not use
standard deviation around the mean, use absolute deviation around the median. Journal of
Experimental Social Psychology, 49(4), 764–766. DOI: 10.1016/j.jesp.2013.03.013
Shieh, G. (2013). Confidence intervals and sample size calculations for the standardized mean
difference effect size between two normal populations under heteroscedasticity. Behavior
Research Methods, 45(4), 955-967. DOI: 10.3758/s13428-013-0320-7
! of !43 44
Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of
statistical models. In Harlow, L. L., Mulaik, S. A. & Steiger, J. H. (Eds.)(2016). What if
there were no significance tests? Routledge; New York.
Keselman, H.J., Wilcox, R.R., Othman, A.R., & Fradette, K. (2002). Trimming, transforming
statistics, and bootstrapping: Circumventing the biasing effects of heterescedasticity and
nonnormality. Journal of Modern Applied Statistical Methods, 1(2), 288-309. DOI:
10.22237/jmasm/1036109820
Zimmerman, D.W. (2003). A warning about the large-sample Wilcoxon-Mann-Whitney Test.
Understanding Statistics, 2(4), 267-280. DOI: 10.1207/S15328031US0204_03
! of !44 44
APPENDIX
Figure 1
Table 4.1
Table 4.2
Table 4.3
Table 4.4
Table 4.5
Table 4.6
Table 4.7
Table 4.8
Table 4.9
Table 4.10
Table 4.11
Table 4.12
Table 4.13
Table 4.14
Table 4.15
Table 4.16
Table 4.17
Table 4.18
Table 4.19
Table 4.20
Table 4.21
Table 4.22
Syntex 1
Syntex 2
Distributions in the simulation study
Bias rates and variances of ES estimators under Assumption 1 (assume ! =! =1, ! = 0)
Bias rates and variances of ES estimators under Assumption 2 (assume ! =1, ! = 0)
Bias rates and variances of ES estimators under Assumption 3 (assume ! =! =1, ! = 0)
Bias rates and variances of ES estimators under Assumption 4
NC and BS BCa CIs under Normal distributions (assume ! =! =10, ! =! =1)
NC and BS BCa CIs under Normal distributions (assume ! =! =10, ! =2, ! =1)
NC and BS BCa CIs under Normal distributions (assume ! =20, ! =10, ! =! =1)
NC and BS BCa CIs under Normal distributions (assume ! =20, ! =10, ! =2, ! =1)
NC and BS BCa CIs under Normal distributions (assume ! =75, ! =50, ! =! =1)
NC and BS BCa CIs under Normal distributions (assume ! =75, ! =50, ! =2, ! =1)
NC and BS BCa CIs under Skew-normal distributions (assume ! =! =10, ! =! =1)
NC and BS BCa CIs under Skew-normal distributions (assume ! =! =10, ! =2, ! =1)
NC and BS BCa CIs under Skew-normal distributions (assume ! =20, ! =10, ! =! =1)
NC and BS BCa CIs under Skew-normal distributions (assume ! =20, ! =10, ! =2, ! =1)
NC and BS BCa CIs under Skew-normal distributions (assume ! =75, ! =50, ! =! =1)
NC and BS BCa CIs under Skew-normal distributions (assume ! =75, ! =50, ! =2, ! =1)
Type I error rates and powers in Student’s and Welch’s t-tests (Normal distributions)
Type I error rates and powers in Student’s and Welch’s t-tests, and Mann-Whitney U
tests (Skew-normal distributions)
Type I error rates and powers in Student’s and Welch’s t-tests, and Mann-Whitney U
tests (SAS-normal distributions)
Type I error rates and powers in Student’s and Welch’s t-tests, and Mann-Whitney U
tests (Gamma distributions)
Required sample sizes in t-tests to achieve 80% power (under constant effect sizes)
Required sample sizes in t-tests to achieve 80% power (under constant mean difference)
R-functions to obtain confidence intervals
R-functions to calculate the required sample size
σ1 σ2 μ2
σ2 μ2
σ1 σ2 μ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
n1 n2 σ1 σ2
Figure 1 Distributions in the simulation study
Note.
(a) Normal densities with SD=1 and increasing mean differences with right location shift,
mean difference = 0.1, 0.5, 0.8, 1.5, 2, 5;
(b) Normal densities with mean differences = 2 with varying SD ratio = 0.5 and 1.5;
(c) Skew-normal densities with location shift = -1 and -2, skewness = 2 and 3 respectively;
(d) Sinh-Arcsinh normalised densities ! with ! = -0.5 and 1;
(e) Gamma densities with shape = 2 and scale = 0.5 and 1.
σϵ,1 fϵ,1(σϵ,1x + μϵ,1) ϵ
Page ! of !1 52
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−10 −5 0 5 10
0.0
0.1
0.2
0.3
0.4
(a) Normal densities with equal variances
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−10 −5 0 5 10
0.0
0.2
0.4
0.6
0.8
(b) Normal densities with unequal variances
sd=0.5sd=1.5
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−4 −2 0 2 4
0.0
0.2
0.4
(c) Skew−normal densities
skewness=2skewness=3
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
(d) Sinh−Arcsinh normal densities
ε=1ε=−0.5
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
(e) Gamma densities
scale=1scale=0.5
Table 4.1: Bias rates and variances of ES estimators under Assumption 1 (assume σ1=σ2=1, µ2 = 0)
Shieh’s d
Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance
10 10
0.1 0.0349 0.2260 0.0347 0.2073 0.0875 0.2592 0.0349 0.0565
0.5 0.0403 0.2345 0.0402 0.2150 0.0890 0.2786 0.0403 0.0586
0.8 0.0449 0.2470 0.0447 0.2265 0.0949 0.3138 0.0449 0.0617
1.5 0.0407 0.3013 0.0406 0.2763 0.0906 0.4511 0.0407 0.0753
2.0 0.0437 0.3653 0.0436 0.3350 0.0935 0.6096 0.0437 0.0913
5.0 0.0438 1.0911 0.0437 1.0006 0.0947 2.4913 0.0438 0.2728
15 10
0.1 0.0220 0.1832 0.0219 0.1713 0.0859 0.2169 0.0286 0.0447
0.5 0.0360 0.1900 0.0359 0.1776 0.0968 0.2371 0.0418 0.0465
0.8 0.0365 0.1996 0.0364 0.1867 0.0963 0.2726 0.0420 0.0493
1.5 0.0350 0.2391 0.0350 0.2235 0.0950 0.4119 0.0407 0.0604
2.0 0.0346 0.2867 0.0345 0.2681 0.0958 0.5697 0.0406 0.0736
5.0 0.0346 0.8322 0.0345 0.7782 0.0950 2.4082 0.0404 0.2263
20 10
0.1 0.0278 0.1628 0.0277 0.1541 0.0859 0.1942 0.0383 0.0375
0.5 0.0250 0.1657 0.0250 0.1569 0.0932 0.2146 0.0399 0.0388
0.8 0.0308 0.1729 0.0308 0.1637 0.0976 0.2483 0.0453 0.0414
1.5 0.0273 0.2067 0.0273 0.1956 0.0916 0.3864 0.0408 0.0525
2.0 0.0259 0.2418 0.0259 0.2289 0.0916 0.5385 0.0400 0.0646
5.0 0.0284 0.6780 0.0283 0.6418 0.0946 2.4020 0.0424 0.2120
20 20
0.1 0.0171 0.1052 0.0171 0.1010 0.0375 0.1116 0.0171 0.0263
0.5 0.0178 0.1084 0.0177 0.1041 0.0392 0.1190 0.0178 0.0271
0.8 0.0214 0.1149 0.0214 0.1104 0.0432 0.1326 0.0214 0.0287
1.5 0.0206 0.1387 0.0206 0.1333 0.0418 0.1845 0.0206 0.0347
2.0 0.0207 0.1626 0.0207 0.1562 0.0422 0.2397 0.0207 0.0407
5.0 0.0208 0.4711 0.0207 0.4526 0.0431 0.9233 0.0208 0.1178
30 20
0.1 0.0094 0.0873 0.0094 0.0846 0.0354 0.0939 0.0121 0.0211
0.5 0.0135 0.0897 0.0135 0.0869 0.0384 0.1007 0.0159 0.0218
0.8 0.0152 0.0942 0.0152 0.0913 0.0413 0.1140 0.0179 0.0231
"n1
Cohen’s !ds Glass’s !ds!μ1 − μ2
Hedges’s "gs"n2
Page ! of !2 52
30 20
1.5 0.0161 0.1120 0.0161 0.1085 0.0423 0.1667 0.0189 0.0281
2.0 0.0164 0.1327 0.0164 0.1286 0.0421 0.2228 0.0191 0.0339
5.0 0.0164 0.3713 0.0164 0.3597 0.0420 0.9027 0.0190 0.1006
40 20
0.1 0.0238 0.0777 0.0238 0.0757 0.0563 0.0842 0.0321 0.0176
0.5 0.0085 0.0799 0.0085 0.0778 0.0364 0.0918 0.0149 0.0183
0.8 0.0150 0.0832 0.0150 0.0811 0.0444 0.1047 0.0220 0.0195
1.5 0.0127 0.0977 0.0127 0.0952 0.0410 0.1556 0.0193 0.0244
2.0 0.0131 0.1152 0.0131 0.1122 0.0415 0.2138 0.0197 0.0302
5.0 0.0139 0.3083 0.0138 0.3003 0.0431 0.8944 0.0208 0.0954
50 50
0.1 0.0036 0.0409 0.0036 0.0402 0.0122 0.0418 0.0036 0.0102
0.5 0.0067 0.0420 0.0067 0.0414 0.0144 0.0443 0.0067 0.0105
0.8 0.0072 0.0440 0.0072 0.0434 0.0151 0.0485 0.0072 0.0110
1.5 0.0072 0.0526 0.0072 0.0518 0.0150 0.0662 0.0072 0.0131
2.0 0.0076 0.0626 0.0076 0.0617 0.0158 0.0866 0.0076 0.0157
5.0 0.0080 0.1731 0.0080 0.1705 0.0154 0.3170 0.0080 0.0433
75 50
0.1 0.0201 0.0338 0.0201 0.0334 0.0287 0.0347 0.0208 0.0081
0.5 0.0066 0.0349 0.0066 0.0345 0.0163 0.0374 0.0078 0.0084
0.8 0.0066 0.0362 0.0066 0.0358 0.0162 0.0414 0.0076 0.0088
1.5 0.0054 0.0432 0.0054 0.0427 0.0151 0.0594 0.0065 0.0108
2.0 0.0060 0.0507 0.0060 0.0501 0.0150 0.0786 0.0069 0.0128
5.0 0.0061 0.1381 0.0061 0.1364 0.0159 0.3116 0.0072 0.0374
100 50
0.1 0.0027 0.0303 0.0027 0.0300 0.0081 0.0312 0.0001 0.0068
0.5 0.0055 0.0313 0.0055 0.0310 0.0162 0.0340 0.0082 0.0071
0.8 0.0061 0.0326 0.0061 0.0322 0.0168 0.0384 0.0088 0.0075
1.5 0.0056 0.0387 0.0056 0.0383 0.0161 0.0567 0.0081 0.0095
2.0 0.0056 0.0445 0.0056 0.0441 0.0163 0.0757 0.0083 0.0115
5.0 0.0050 0.1160 0.0050 0.1149 0.0162 0.3064 0.0079 0.0355
Shieh’s d
Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance"n1
Cohen’s !ds Glass’s !ds!μ1 − μ2
Hedges’s "gs"n2
Page ! of !3 52
Table 4.2: Bias rates and variances of ES estimators under Assumption 2 (assuming σ2=1, µ2 = 0)
Shieh’s d
Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance
10 10
0.5
0.1 0.0574 0.2342 0.0573 0.2147 0.0914 0.1608 0.0574 0.0585
0.5 0.0621 0.2515 0.0620 0.2306 0.0974 0.1823 0.0621 0.0629
0.8 0.0626 0.2840 0.0625 0.2604 0.0972 0.2174 0.0626 0.0710
1.5 0.0584 0.4046 0.0583 0.3710 0.0930 0.3592 0.0584 0.1011
2.0 0.0589 0.5424 0.0588 0.4974 0.0939 0.5211 0.0589 0.1356
5.0 0.0593 2.1453 0.0592 1.9675 0.0939 2.3901 0.0593 0.5363
1.5
0.1 0.0884 0.2284 0.0883 0.2095 0.1274 0.4205 0.0884 0.0571
0.5 0.0430 0.2341 0.0429 0.2147 0.0873 0.4387 0.0430 0.0585
0.8 0.0493 0.2464 0.0492 0.2260 0.0919 0.4729 0.0493 0.0616
1.5 0.0485 0.2824 0.0484 0.2590 0.0923 0.6158 0.0485 0.0706
2.0 0.0508 0.3270 0.0507 0.2999 0.0959 0.7740 0.0508 0.0818
5.0 0.0508 0.8301 0.0507 0.7613 0.0945 2.6408 0.0508 0.2075
15 10
0.5
0.1 0.0443 0.2451 0.0442 0.2292 0.0895 0.1508 0.0608 0.0477
0.5 0.0504 0.2589 0.0504 0.2420 0.0957 0.1703 0.0668 0.0515
0.8 0.0497 0.2877 0.0497 0.2690 0.0952 0.2058 0.0663 0.0589
1.5 0.0484 0.3997 0.0484 0.3738 0.0941 0.3499 0.0650 0.0883
2.0 0.0466 0.5176 0.0466 0.4840 0.0916 0.5010 0.0629 0.1193
5.0 0.0480 1.9608 0.0479 1.8334 0.0942 2.3890 0.0648 0.5004
1.5
0.1 0.0384 0.1561 0.0383 0.1460 0.0961 0.3223 0.0348 0.0436
0.5 0.0374 0.1610 0.0373 0.1506 0.0944 0.3445 0.0335 0.0448
0.8 0.0377 0.1687 0.0376 0.1577 0.0946 0.3821 0.0339 0.0468
1.5 0.0376 0.1926 0.0375 0.1801 0.0949 0.5213 0.0339 0.0528
2.0 0.0386 0.2247 0.0385 0.2101 0.0958 0.6749 0.0349 0.0609
5.0 0.0381 0.5616 0.0380 0.5251 0.0951 2.5562 0.0342 0.1460
0.5
0.1 0.0325 0.2555 0.0325 0.2419 0.0840 0.1456 0.0604 0.0402
0.5 0.0432 0.2711 0.0432 0.2566 0.0974 0.1676 0.0728 0.0445
0.8 0.0418 0.2947 0.0417 0.2790 0.0972 0.2041 0.0719 0.0513
"n1 "n2
Hedges’s !gs"σ1σ2
!μ1 − μ2
Glass’s !dsCohen’s !ds
Page ! of !4 52
20 10
0.5
1.5 0.0407 0.3961 0.0407 0.3750 0.0948 0.3451 0.0702 0.0786
2.0 0.0392 0.4991 0.0391 0.4724 0.0917 0.4931 0.0677 0.1071
5.0 0.0410 1.8138 0.0410 1.7169 0.0951 2.3741 0.0705 0.4678
1.5
0.1 0.0200 0.1250 0.0200 0.1184 0.0819 0.2757 0.0195 0.0363
0.5 0.0349 0.1273 0.0349 0.1205 0.0976 0.2934 0.0344 0.0368
0.8 0.0322 0.1325 0.0322 0.1254 0.0973 0.3331 0.0326 0.0385
1.5 0.0315 0.1533 0.0315 0.1452 0.0961 0.4776 0.0316 0.0445
2.0 0.0304 0.1745 0.0303 0.1652 0.0930 0.6291 0.0300 0.0505
5.0 0.0301 0.4316 0.0300 0.4086 0.0925 2.4589 0.0297 0.1238
20 20
0.5
0.1 0.0316 0.1074 0.0316 0.1032 0.0458 0.0698 0.0316 0.0269
0.5 0.0270 0.1160 0.0270 0.1114 0.0412 0.0783 0.0270 0.0290
0.8 0.0273 0.1273 0.0273 0.1223 0.0414 0.0901 0.0273 0.0318
1.5 0.0276 0.1793 0.0276 0.1723 0.0420 0.1426 0.0276 0.0448
2.0 0.0273 0.2361 0.0272 0.2268 0.0416 0.1999 0.0273 0.0590
5.0 0.0276 0.9078 0.0276 0.8721 0.0421 0.8808 0.0276 0.2270
1.5
0.1 0.0287 0.1061 0.0286 0.1020 0.0482 0.1816 0.0287 0.0265
0.5 0.0248 0.1090 0.0248 0.1047 0.0421 0.1889 0.0248 0.0272
0.8 0.0272 0.1137 0.0272 0.1092 0.0465 0.2043 0.0272 0.0284
1.5 0.0227 0.1289 0.0227 0.1238 0.0404 0.2527 0.0227 0.0322
2.0 0.0243 0.1480 0.0243 0.1422 0.0413 0.3086 0.0243 0.0370
5.0 0.0243 0.3639 0.0243 0.3496 0.0428 0.9861 0.0243 0.0910
30 20
0.5
0.1 0.0245 0.1136 0.0245 0.1100 0.0436 0.0655 0.0319 0.0217
0.5 0.0230 0.1209 0.0230 0.1171 0.0419 0.0733 0.0303 0.0236
0.8 0.0233 0.1334 0.0233 0.1292 0.0421 0.0862 0.0306 0.0267
1.5 0.0235 0.1810 0.0235 0.1754 0.0426 0.1383 0.0309 0.0388
2.0 0.0226 0.2310 0.0226 0.2238 0.0414 0.1929 0.0299 0.0515
5.0 0.0235 0.8638 0.0235 0.8369 0.0427 0.8807 0.0310 0.2119
0.1 0.0129 0.0745 0.0129 0.0722 0.0356 0.1397 0.0107 0.0208
Shieh’s d
Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance"n1 "n2
Hedges’s !gs"σ1σ2
!μ1 − μ2
Glass’s !dsCohen’s !ds
Page ! of !5 52
1.5
0.5 0.0176 0.0759 0.0176 0.0735 0.0409 0.1463 0.0156 0.0211
0.8 0.0174 0.0799 0.0174 0.0774 0.0419 0.1621 0.0157 0.0222
1.5 0.0171 0.0915 0.0171 0.0886 0.0398 0.2113 0.0150 0.0250
2.0 0.0171 0.1033 0.0170 0.1001 0.0402 0.2654 0.0151 0.0280
5.0 0.0177 0.2560 0.0177 0.2480 0.0406 0.9376 0.0156 0.0661
40 20
0.5
0.1 0.0029 0.1198 0.0029 0.1167 0.0239 0.0632 0.0148 0.0182
0.5 0.0188 0.1259 0.0188 0.1226 0.0412 0.0707 0.0317 0.0198
0.8 0.0187 0.1376 0.0187 0.1340 0.0409 0.0836 0.0314 0.0227
1.5 0.0199 0.1817 0.0199 0.1770 0.0420 0.1355 0.0325 0.0340
2.0 0.0194 0.2285 0.0194 0.2226 0.0411 0.1902 0.0318 0.0459
5.0 0.0190 0.8161 0.0190 0.7950 0.0408 0.8699 0.0315 0.1943
1.5
0.1 0.0211 0.0599 0.0211 0.0583 0.0511 0.1185 0.0217 0.0173
0.5 0.0135 0.0617 0.0135 0.0601 0.0397 0.1270 0.0130 0.0178
0.8 0.0155 0.0638 0.0155 0.0621 0.0430 0.1397 0.0154 0.0184
1.5 0.0152 0.0724 0.0152 0.0705 0.0418 0.1921 0.0147 0.0208
2.0 0.0146 0.0814 0.0146 0.0793 0.0417 0.2471 0.0144 0.0234
5.0 0.0149 0.1989 0.0149 0.1937 0.0426 0.9392 0.0149 0.0573
50 50
0.5
0.1 0.0086 0.0412 0.0086 0.0405 0.0137 0.0261 0.0086 0.0103
0.5 0.0086 0.0437 0.0086 0.0430 0.0137 0.0286 0.0086 0.0109
0.8 0.0096 0.0490 0.0096 0.0482 0.0148 0.0334 0.0096 0.0122
1.5 0.0104 0.0667 0.0104 0.0657 0.0155 0.0505 0.0104 0.0167
2.0 0.0105 0.0871 0.0105 0.0858 0.0155 0.0698 0.0105 0.0218
5.0 0.0105 0.3340 0.0105 0.3289 0.0157 0.3049 0.0105 0.0835
1.5
0.1 0.0065 0.0409 0.0065 0.0403 0.0127 0.0677 0.0065 0.0102
0.5 0.0085 0.0420 0.0085 0.0413 0.0146 0.0707 0.0085 0.0105
0.8 0.0097 0.0431 0.0097 0.0424 0.0159 0.0742 0.0097 0.0108
1.5 0.0094 0.0496 0.0094 0.0488 0.0162 0.0929 0.0094 0.0124
2.0 0.0084 0.0561 0.0084 0.0553 0.0157 0.1124 0.0084 0.0140
Shieh’s d
Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance"n1 "n2
Hedges’s !gs"σ1σ2
!μ1 − μ2
Glass’s !dsCohen’s !ds
Page ! of !6 52
5.0 0.0087 0.1345 0.0087 0.1324 0.0155 0.3422 0.0087 0.0336
75 50
0.5
0.1 0.0081 0.0434 0.0081 0.0429 0.0150 0.0243 0.0108 0.0082
0.5 0.0061 0.0466 0.0061 0.0460 0.0128 0.0272 0.0087 0.0090
0.8 0.0076 0.0502 0.0076 0.0496 0.0144 0.0312 0.0103 0.0099
1.5 0.0088 0.0684 0.0088 0.0675 0.0157 0.0492 0.0115 0.0144
2.0 0.0085 0.0873 0.0085 0.0862 0.0152 0.0680 0.0112 0.0191
5.0 0.0087 0.3204 0.0087 0.3165 0.0153 0.2997 0.0113 0.0767
1.5
0.1 0.0023 0.0290 0.0023 0.0287 0.0109 0.0521 0.0016 0.0081
0.5 0.0047 0.0298 0.0047 0.0294 0.0138 0.0549 0.0041 0.0083
0.8 0.0068 0.0309 0.0068 0.0305 0.0150 0.0591 0.0059 0.0086
1.5 0.0060 0.0353 0.0060 0.0349 0.0145 0.0770 0.0052 0.0097
2.0 0.0060 0.0400 0.0060 0.0395 0.0141 0.0964 0.0051 0.0109
5.0 0.0071 0.0961 0.0071 0.0949 0.0164 0.3275 0.0065 0.0248
100 50
0.5
0.1 0.0053 0.0461 0.0053 0.0456 0.0132 0.0235 0.0099 0.0069
0.5 0.0073 0.0487 0.0073 0.0482 0.0153 0.0262 0.0120 0.0075
0.8 0.0065 0.0526 0.0065 0.0521 0.0147 0.0304 0.0113 0.0085
1.5 0.0077 0.0693 0.0077 0.0686 0.0160 0.0481 0.0126 0.0125
2.0 0.0082 0.0879 0.0082 0.0870 0.0163 0.0677 0.0129 0.0170
5.0 0.0073 0.3069 0.0073 0.3038 0.0153 0.3001 0.0120 0.0703
1.5
0.1 0.0061 0.0236 0.0061 0.0234 0.0160 0.0446 0.0060 0.0068
0.5 0.0048 0.0240 0.0048 0.0237 0.0147 0.0470 0.0047 0.0069
0.8 0.0053 0.0248 0.0053 0.0246 0.0159 0.0516 0.0054 0.0072
1.5 0.0059 0.0281 0.0059 0.0279 0.0159 0.0692 0.0058 0.0081
2.0 0.0057 0.0323 0.0057 0.0319 0.0156 0.0891 0.0056 0.0092
5.0 0.0054 0.0751 0.0054 0.0744 0.0156 0.3190 0.0054 0.0215
Shieh’s d
Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance"n1 "n2
Hedges’s !gs"σ1σ2
!μ1 − μ2
Glass’s !dsCohen’s !ds
Page ! of !7 52
Table 4.3: Bias rates and variances of ES estimators under Assumption 3 (assuming σ1=σ2=1, µ2 = 0 in Skew-normal distribution)
Shieh’s d
Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var.
10 10 -1 0.0708 0.3170 0.0707 0.2908 0.0708 0.0793 0.2589 1.0999 0.0763 0.4445 0.0001 0.0118
15 10 -1 0.0567 0.2530 0.0567 0.2366 0.0546 0.0590 0.2633 1.0108 0.0631 0.3468 -0.0011 0.0094
20 10 -1 0.0436 0.2172 0.0435 0.2056 0.0483 0.0483 0.2721 0.9442 0.0507 0.2865 0.0017 0.0084
20 20 -1 0.0321 0.1450 0.0321 0.1393 0.0321 0.0363 0.1125 0.3592 0.0335 0.1945 -0.0002 0.0057
30 20 -1 0.0248 0.1181 0.0248 0.1144 0.0238 0.0274 0.1155 0.3264 0.0248 0.1542 0.0013 0.0046
40 20 -1 0.0219 0.1028 0.0219 0.1002 0.0241 0.0226 0.1158 0.3122 0.0222 0.1329 -0.0002 0.0041
50 50 -1 0.0112 0.0551 0.0112 0.0542 0.0112 0.0138 0.0406 0.1158 0.0097 0.0719 0.0012 0.0023
75 50 -1 0.0098 0.0452 0.0098 0.0447 0.0092 0.0104 0.0420 0.1016 0.0099 0.0584 0.0003 0.0018
100 50 -1 0.0087 0.0395 0.0087 0.0391 0.0093 0.0086 0.0455 0.0954 0.0084 0.0501 -0.0001 0.0016
"n1
"PSindep"dMADHedges’s !gs"n2 !μ1 − μ2
"dRCohen’s !ds
Page ! of !8 52
Table 4.4: Bias rates and variances of ES estimators under Assumption 4
Shieh’s d
Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var.
Skew-normal
10 10 -2 0.1142 0.4989 0.1141 0.4575 0.1142 0.1247 0.2512 3.4940 0.1288 0.7297 -0.0004 0.0108
15 10 -2 0.0832 0.3118 0.0831 0.2915 0.0679 0.0848 0.2609 3.3201 0.0950 0.4450 0.0008 0.0076
20 10 -2 0.0672 0.2244 0.0672 0.2124 0.0504 0.0640 0.2725 3.0968 0.0739 0.3166 0.0001 0.0059
20 20 -2 0.0545 0.2235 0.0545 0.2147 0.0545 0.0559 0.1086 1.1569 0.0584 0.3147 0.0013 0.0054
30 20 -2 0.0401 0.1408 0.0401 0.1364 0.0327 0.0388 0.1090 1.0043 0.0425 0.1946 0.0008 0.0037
40 20 -2 0.0309 0.1023 0.0309 0.0997 0.0231 0.0295 0.1148 0.9272 0.0350 0.1391 0.0001 0.0029
50 50 -2 0.0201 0.0818 0.0201 0.0805 0.0201 0.0204 0.0387 0.3662 0.0193 0.1114 0.0012 0.0021
75 50 -2 0.0155 0.0533 0.0155 0.0527 0.0126 0.0148 0.0386 0.3128 0.0166 0.0735 0.0004 0.0015
100 50 -2 0.0121 0.0394 0.0121 0.0390 0.0090 0.0115 0.0445 0.2960 0.0123 0.0529 0.0008 0.0012
10 10 -1 0.1468 0.3356 0.1467 0.3078 0.1468 0.0839 0.2148 1.8313 0.1535 0.4404 -0.0008 0.0158
15 10 -1 0.1065 0.2088 0.1064 0.1952 0.0817 0.0575 0.2527 1.5430 0.1192 0.2718 -0.0005 0.0110
20 10 -1 0.0859 0.1512 0.0859 0.1431 0.0570 0.0436 0.2522 1.2746 0.1017 0.1935 -0.0015 0.0087
20 20 -1 0.0763 0.1509 0.0763 0.1449 0.0763 0.0377 0.1022 0.6094 0.0759 0.1911 -0.0028 0.0078
30 20 -1 0.0497 0.0960 0.0497 0.0930 0.0381 0.0269 0.1070 0.4775 0.0568 0.1219 0.0003 0.0054
40 20 -1 0.0416 0.0711 0.0415 0.0692 0.0274 0.0207 0.1077 0.3962 0.0482 0.0887 -0.0008 0.0042
50 50 -1 0.0263 0.0570 0.0263 0.0561 0.0263 0.0142 0.0364 0.1995 0.0249 0.0696 0.0006 0.0031
75 50 -1 0.0216 0.0362 0.0216 0.0357 0.0170 0.0103 0.0358 0.1514 0.0230 0.0444 -0.0008 0.0021
100 50 -1 0.0170 0.0273 0.0170 0.0271 0.0113 0.0081 0.0400 0.1280 0.0183 0.0334 -0.0001 0.0017
Sinh-Arcsinh normal
10 10 1.6 0.0297 0.1748 0.0296 0.1603 0.0297 0.0437 0.3803 1.6376 0.0871 0.2517 -0.0003 0.0097
15 10 1.6 0.0245 0.1206 0.0245 0.1128 0.0193 0.0360 0.3415 1.3428 0.0665 0.1696 -0.0004 0.0079
20 10 1.6 0.0212 0.0979 0.0211 0.0927 0.0184 0.0323 0.3292 1.2347 0.0537 0.1351 -0.0003 0.0069
20 20 1.6 0.0144 0.0790 0.0144 0.0759 0.0144 0.0198 0.1696 0.5076 0.0422 0.1062 0.0002 0.0046
30 20 1.6 0.0127 0.0558 0.0127 0.0541 0.0101 0.0168 0.1498 0.4281 0.0316 0.0738 0.0002 0.0037
40 20 1.6 0.0119 0.0462 0.0119 0.0450 0.0098 0.0153 0.1425 0.3838 0.0275 0.0609 0.0001 0.0034
50 50 1.6 0.0055 0.0302 0.0055 0.0297 0.0055 0.0075 0.0634 0.1621 0.0164 0.0394 0.0000 0.0018
"n1
"PSindep"dMADHedges’s !gs"n2
" , " , "σ1 = 2 σ2 = 1 μ2 = (0, − 1)
!μ1 − μ2
"dR
" , "(σ1, σ2) = {(1.6,1); (1.15,1); (1.6,1.15)} μ2 = (0,0, − 0.7)
Cohen’s !ds
Page ! of !9 52
75 50 1.6 0.0050 0.0219 0.0050 0.0216 0.0038 0.0066 0.0569 0.1380 0.0121 0.0289 0.0000 0.0015
100 50 1.6 0.0048 0.0182 0.0048 0.0180 0.0041 0.0060 0.0516 0.1208 0.0108 0.0237 0.0000 0.0013
10 10 -0.7 0.0142 0.2079 0.0141 0.1907 0.0142 0.0520 0.3926 0.7648 0.0808 0.2689 -0.0009 0.0150
15 10 -0.7 0.0126 0.1610 0.0125 0.1505 0.0200 0.0425 0.3503 0.6677 0.0545 0.2034 -0.0008 0.0123
20 10 -0.7 0.0074 0.1378 0.0073 0.1304 0.0222 0.0364 0.3276 0.5905 0.0410 0.1709 0.0014 0.0110
20 20 -0.7 0.0056 0.0976 0.0056 0.0937 0.0056 0.0244 0.1579 0.2607 0.0347 0.1191 -0.0001 0.0074
30 20 -0.7 0.0047 0.0764 0.0047 0.0740 0.0083 0.0202 0.1494 0.2177 0.0244 0.0928 0.0002 0.0061
40 20 -0.7 0.0054 0.0659 0.0054 0.0642 0.0128 0.0174 0.1411 0.1987 0.0207 0.0798 -0.0003 0.0054
50 50 -0.7 0.0039 0.0375 0.0039 0.0369 0.0039 0.0094 0.0597 0.0868 0.0160 0.0447 -0.0011 0.0029
75 50 -0.7 0.0001 0.0295 0.0001 0.0291 0.0014 0.0078 0.0582 0.0728 0.0064 0.0353 0.0009 0.0024
100 50 -0.7 0.0014 0.0257 0.0014 0.0254 0.0040 0.0067 0.0557 0.0650 0.0076 0.0306 0.0003 0.0021
10 10 2.3 0.0346 0.1596 0.0345 0.1464 0.0346 0.0399 0.3451 1.5958 0.0915 0.2530 -0.0004 0.0045
15 10 2.3 0.0296 0.1207 0.0295 0.1129 0.0235 0.0317 0.3263 1.4340 0.0762 0.1823 -0.0001 0.0036
20 10 2.3 0.0251 0.1032 0.0251 0.0977 0.0222 0.0280 0.3154 1.3081 0.0640 0.1509 -0.0002 0.0032
20 20 2.3 0.0177 0.0746 0.0177 0.0716 0.0177 0.0186 0.1482 0.4827 0.0464 0.1088 0.0002 0.0021
30 20 2.3 0.0144 0.0567 0.0144 0.0549 0.0112 0.0148 0.1391 0.4067 0.0356 0.0803 0.0000 0.0018
40 20 2.3 0.0124 0.0488 0.0124 0.0476 0.0107 0.0130 0.1337 0.3669 0.0314 0.0680 0.0001 0.0015
50 50 2.3 0.0072 0.0287 0.0072 0.0282 0.0072 0.0072 0.0547 0.1466 0.0177 0.0406 0.0001 0.0008
75 50 2.3 0.0056 0.0223 0.0056 0.0220 0.0043 0.0057 0.0529 0.1252 0.0139 0.0307 -0.0001 0.0007
100 50 2.3 0.0044 0.0190 0.0044 0.0188 0.0035 0.0050 0.0457 0.1119 0.0124 0.0260 -0.0002 0.0006
Gamma
10 10 -1 0.1486 0.3816 0.1484 0.3500 0.1486 0.0954 0.2391 1.4308 0.0840 0.5045 0.0016 0.0130
15 10 -1 0.1215 0.2780 0.1214 0.2600 0.0930 0.0675 0.2522 1.2892 0.0690 0.3685 0.0005 0.0097
20 10 -1 0.1023 0.2191 0.1023 0.2074 0.0723 0.0523 0.2581 1.2084 0.0633 0.2935 -0.0014 0.0080
20 20 -1 0.0807 0.1788 0.0807 0.1718 0.0807 0.0447 0.0999 0.4561 0.0388 0.2261 -0.0026 0.0065
30 20 -1 0.0593 0.1275 0.0592 0.1235 0.0449 0.0312 0.1064 0.3935 0.0309 0.1652 0.0006 0.0048
40 20 -1 0.0502 0.1049 0.0502 0.1022 0.0335 0.0248 0.1108 0.3760 0.0241 0.1365 0.0011 0.0040
50 50 -1 0.0304 0.0674 0.0304 0.0663 0.0304 0.0168 0.0354 0.1454 0.0134 0.0837 0.0001 0.0025
Shieh’s d
Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var."n1
"PSindep
" , "(σ1, σ2) = {(1.4,1); (0.7,1); (1.4,0.7)} μ2 = 1
"dMADHedges’s !gs"n2 !μ1 − μ2
"dRCohen’s !ds
Page ! of !10 52
75 50 -1 0.0247 0.0490 0.0247 0.0484 0.0187 0.0121 0.0389 0.1302 0.0117 0.0630 -0.0002 0.0019
100 50 -1 0.0194 0.0394 0.0194 0.0390 0.0127 0.0095 0.0429 0.1178 0.0094 0.0515 0.0006 0.0016
10 10 -2 0.0699 0.5993 0.0698 0.5496 0.0699 0.1498 0.2696 2.1928 0.0792 1.0108 -0.0034 0.0028
15 10 -2 0.0597 0.5455 0.0596 0.5100 0.0632 0.1156 0.2754 2.1580 0.0645 0.8701 -0.0030 0.0023
20 10 -2 0.0505 0.5064 0.0505 0.4793 0.0618 0.0983 0.2784 2.1929 0.0541 0.7874 0.0042 0.0021
20 20 -2 0.0329 0.2655 0.0329 0.2551 0.0329 0.0664 0.1141 0.6544 0.0336 0.4179 -0.0019 0.0013
30 20 -2 0.0274 0.2447 0.0274 0.2371 0.0282 0.0502 0.1163 0.6409 0.0250 0.3785 0.0008 0.0011
40 20 -2 0.0241 0.2283 0.0241 0.2224 0.0289 0.0423 0.1191 0.6215 0.0236 0.3475 0.0004 0.0010
50 50 -2 0.0124 0.0998 0.0124 0.0983 0.0124 0.0250 0.0429 0.2008 0.0109 0.1501 0.0010 0.0005
75 50 -2 0.0106 0.0931 0.0106 0.0920 0.0107 0.0188 0.0432 0.1963 0.0101 0.1388 0.0013 0.0004
100 50 -2 0.0096 0.0886 0.0096 0.0877 0.0109 0.0159 0.0426 0.1951 0.0077 0.1313 0.0013 0.0004
10 10 1 0.0482 0.1935 0.0481 0.1774 0.0482 0.0484 0.4475 3.3040 0.0765 0.2828 0.0003 0.0128
15 10 1 0.0378 0.1229 0.0377 0.1149 0.0414 0.0413 0.4147 2.9108 0.0452 0.1735 -0.0002 0.0101
20 10 1 0.0328 0.0914 0.0327 0.0865 0.0435 0.0372 0.3989 2.6204 0.0318 0.1261 0.0001 0.0086
20 20 1 0.0275 0.0858 0.0275 0.0824 0.0275 0.0214 0.1944 1.0507 0.0356 0.1214 0.0005 0.0061
30 20 1 0.0190 0.0561 0.0190 0.0543 0.0200 0.0192 0.1825 0.8873 0.0214 0.0767 -0.0002 0.0049
40 20 1 0.0175 0.0433 0.0174 0.0422 0.0223 0.0181 0.1692 0.8144 0.0133 0.0577 -0.0001 0.0042
50 50 1 0.0108 0.0323 0.0108 0.0318 0.0108 0.0081 0.0666 0.3309 0.0111 0.0439 0.0000 0.0024
75 50 1 0.0082 0.0215 0.0082 0.0213 0.0085 0.0074 0.0681 0.2695 0.0065 0.0284 -0.0002 0.0019
100 50 1 0.0071 0.0166 0.0071 0.0164 0.0087 0.0070 0.0629 0.2451 0.0046 0.0219 -0.0001 0.0017
Shieh’s d
Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var. Bias rate Var. Bias
rate Var."n1
"PSindep"dMADHedges’s !gs"n2 !μ1 − μ2
"dRCohen’s !ds
Page ! of !11 52
Table 4.5: NC and BS BCa CIs under Normal distributions (assume n1=n2=10, σ1=σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9497 0.9512 0.9596 0.9188
Mean low bound -0.8885 -0.4449 -0.9571 -0.9283
Median low bound -0.8765 -0.4383 -0.9495 -0.9055
SD low bound 0.4678 0.2337 0.4679 0.5643
Mean up bound 0.8935 0.4474 0.9613 0.9344
Median up bound 0.8810 0.4405 0.9526 0.9102
SD up bound 0.4682 0.2339 0.4677 0.5645
Mean Width 1.7821 0.8923 1.9184 1.8626
Median Width 1.7665 0.8832 1.9053 1.8366
SD Width 0.0398 0.0219 0.1409 0.1608
Empirical power 0.9509 0.9522 0.9596 0.9188
% of Coverage 0.9484 0.9498 0.9571 0.9156
Mean low bound -0.6874 -0.3446 -0.7582 -0.6911
Median low bound -0.6798 -0.3399 -0.7626 -0.6766
SD low bound 0.4608 0.2296 0.4668 0.5529
Mean up bound 1.1001 0.5507 1.1630 1.1772
Median up bound 1.0823 0.5411 1.1436 1.1441
SD up bound 0.4857 0.2432 0.4813 0.5883
Mean Width 1.7875 0.8953 1.9212 1.8682
Median Width 1.7710 0.8855 1.9065 1.8390
SD Width 0.0472 0.0260 0.1494 0.1725
Empirical power 0.0713 0.0691 0.0587 0.1114
% of Coverage 0.9489 0.9505 0.9548 0.9113
Mean low bound -0.3910 -0.1971 -0.4590 -0.3409
Median low bound -0.3935 -0.1968 -0.4771 -0.3404
SD low bound 0.4496 0.2232 0.4711 0.5375
Mean up bound 1.4228 0.7125 1.4751 1.5533
Median up bound 1.3953 0.6977 1.4416 1.5050
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !12 52
SD up bound 0.5110 0.2567 0.5051 0.6245
Mean Width 1.8138 0.9096 1.9341 1.8942
Median Width 1.7889 0.8967 1.9127 1.8495
SD Width 0.0727 0.0399 0.1843 0.2217
Empirical power 0.1836 0.1799 0.1562 0.2532
% of Coverage 0.9496 0.9526 0.9440 0.8984
Mean low bound 0.0825 0.0373 0.0394 0.2195
Median low bound 0.0671 0.0313 0.0024 0.1977
SD low bound 0.4451 0.2199 0.5007 0.5314
Mean up bound 1.9880 0.9970 2.0152 2.2085
Median up bound 1.9409 0.9727 1.9592 2.1361
SD up bound 0.5677 0.2865 0.5680 0.7049
Mean Width 1.9055 0.9596 1.9758 1.9890
Median Width 1.8738 0.9414 1.9376 1.9151
SD Width 0.1280 0.0704 0.2727 0.3398
Empirical power 0.5601 0.5546 0.5022 0.6551
% of Coverage 0.9499 0.9549 0.9135 0.8692
Mean low bound 0.9532 0.4660 1.0174 1.2456
Median low bound 0.9213 0.4494 0.9733 1.2047
SD low bound 0.4791 0.2356 0.5852 0.5744
Mean up bound 3.1885 1.6038 3.1591 3.5858
Median up bound 3.1126 1.5652 3.0740 3.4730
SD up bound 0.7226 0.3660 0.7370 0.9173
Mean Width 2.2352 1.1378 2.1416 2.3402
Median Width 2.1913 1.1136 2.0754 2.2329
SD Width 0.2455 0.1342 0.4551 0.5727
Empirical power 0.9867 0.9862 0.9753 0.9933
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !13 52
Table 4.6: NC and BS BCa CIs under Normal distributions (assume n1=n2=10, σ1=2, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9452 0.9500 0.9583 0.9124
Mean low bound -0.8908 -0.4475 -0.9854 -0.9606
Median low bound -0.8765 -0.4383 -0.9765 -0.9274
SD low bound 0.4784 0.2384 0.4820 0.6048
Mean up bound 0.8924 0.4483 0.9870 0.9628
Median up bound 0.8765 0.4383 0.9772 0.9299
SD up bound 0.4786 0.2386 0.4818 0.6049
Mean Width 1.7832 0.8958 1.9724 1.9233
Median Width 1.7665 0.8855 1.9514 1.8838
SD Width 0.0419 0.0267 0.1807 0.2128
Empirical power 0.9464 0.9510 0.9583 0.9124
% of Coverage 0.9456 0.9502 0.9572 0.9103
Mean low bound -0.6840 -0.3449 -0.7814 -0.7082
Median low bound -0.6798 -0.3399 -0.7903 -0.6882
SD low bound 0.4663 0.2307 0.4767 0.5823
Mean up bound 1.1043 0.5543 1.1935 1.2206
Median up bound 1.0823 0.5411 1.1665 1.1752
SD up bound 0.4925 0.2473 0.4939 0.6294
Mean Width 1.7884 0.8992 1.9748 1.9288
Median Width 1.7710 0.8877 1.9520 1.8854
SD Width 0.0492 0.0311 0.1898 0.2274
Empirical power 0.0748 0.0688 0.0586 0.1165
% of Coverage 0.9450 0.9503 0.9532 0.9058
Mean low bound -0.3868 -0.1989 -0.4821 -0.3472
Median low bound -0.3935 -0.2012 -0.5119 -0.3487
SD low bound 0.4565 0.2235 0.4861 0.5629
Mean up bound 1.4284 0.7177 1.5077 1.6145
Median up bound 1.3908 0.6977 1.4551 1.5477
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !14 52
SD up bound 0.5212 0.2643 0.5273 0.6761
Mean Width 1.8152 0.9166 1.9898 1.9617
Median Width 1.7889 0.8989 1.9580 1.8997
SD Width 0.0766 0.0482 0.2370 0.2897
Empirical power 0.1873 0.1751 0.1499 0.2581
% of Coverage 0.9423 0.9484 0.9371 0.8856
Mean low bound 0.0917 0.0329 0.0246 0.2309
Median low bound 0.0716 0.0246 -0.0221 0.2085
SD low bound 0.4622 0.2241 0.5346 0.5592
Mean up bound 2.0017 1.0107 2.0662 2.3130
Median up bound 1.9499 0.9816 1.9953 2.2253
SD up bound 0.5928 0.3034 0.6178 0.7865
Mean Width 1.9100 0.9778 2.0417 2.0821
Median Width 1.8738 0.9570 1.9865 1.9782
SD Width 0.1365 0.0838 0.3545 0.4424
Empirical power 0.5629 0.5407 0.4833 0.6545
% of Coverage 0.9362 0.9464 0.8975 0.8486
Mean low bound 0.9786 0.4541 1.0320 1.2766
Median low bound 0.9347 0.4293 0.9816 1.2210
SD low bound 0.5157 0.2512 0.6377 0.6146
Mean up bound 3.2298 1.6468 3.2708 3.7966
Median up bound 3.1350 1.5988 3.1659 3.6577
SD up bound 0.7845 0.4014 0.8330 1.0475
Mean Width 2.2512 1.1928 2.2388 2.5200
Median Width 2.2003 1.1672 2.1416 2.3746
SD Width 0.2710 0.1572 0.5756 0.7277
Empirical power 0.9866 0.9834 0.9650 0.9936
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !15 52
Table 4.7: NC and BS BCa CIs under Normal distributions (n1=20, n2=10, σ1=σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9500 0.9494 0.9498 0.9220
Mean low bound -0.7708 -0.3651 -0.7958 -0.7774
Median low bound -0.7630 -0.3597 -0.7942 -0.7691
SD low bound 0.3986 0.1909 0.4072 0.4525
Mean up bound 0.7640 0.3619 0.7888 0.7697
Median up bound 0.7552 0.3560 0.7849 0.7588
SD up bound 0.3983 0.1907 0.4060 0.4511
Mean Width 1.5348 0.7270 1.5845 1.5471
Median Width 1.5260 0.7212 1.5716 1.5340
SD Width 0.0206 0.0148 0.1787 0.1723
Empirical power 0.9510 0.9507 0.9498 0.9220
% of Coverage 0.9484 0.9489 0.9483 0.9211
Mean low bound -0.5645 -0.2674 -0.5907 -0.5490
Median low bound -0.5616 -0.2647 -0.5916 -0.5427
SD low bound 0.3934 0.1868 0.4062 0.4457
Mean up bound 0.9741 0.4624 0.9955 1.0021
Median up bound 0.9644 0.4546 0.9875 0.9856
SD up bound 0.4086 0.1975 0.4134 0.4638
Mean Width 1.5386 0.7297 1.5862 1.5511
Median Width 1.5298 0.7230 1.5727 1.5362
SD Width 0.0258 0.0184 0.1819 0.1769
Empirical power 0.0784 0.0775 0.0792 0.1157
% of Coverage 0.9513 0.9502 0.9485 0.9201
Mean low bound -0.2696 -0.1293 -0.2959 -0.2223
Median low bound -0.2711 -0.1315 -0.3043 -0.2232
SD low bound 0.3852 0.1817 0.4044 0.4351
Mean up bound 1.2879 0.6138 1.3038 1.3486
Median up bound 1.2703 0.6025 1.2901 1.3273
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !16 52
SD up bound 0.4223 0.2076 0.4248 0.4824
Mean Width 1.5574 0.7432 1.5997 1.5709
Median Width 1.5453 0.7339 1.5835 1.5525
SD Width 0.0421 0.0299 0.1938 0.1939
Empirical power 0.2343 0.2268 0.2264 0.2996
% of Coverage 0.9514 0.9485 0.9424 0.9130
Mean low bound 0.2110 0.0921 0.1937 0.3108
Median low bound 0.2014 0.0840 0.1796 0.2996
SD low bound 0.3857 0.1836 0.4196 0.4355
Mean up bound 1.8355 0.8825 1.8365 1.9497
Median up bound 1.8087 0.8654 1.8096 1.9134
SD up bound 0.4606 0.2334 0.4616 0.5288
Mean Width 1.6245 0.7904 1.6427 1.6388
Median Width 1.6073 0.7796 1.6178 1.6081
SD Width 0.0775 0.0538 0.2307 0.2431
Empirical power 0.7021 0.6816 0.6696 0.7627
% of Coverage 0.9495 0.9448 0.9275 0.8954
Mean low bound 1.1093 0.4971 1.1368 1.3060
Median low bound 1.0844 0.4783 1.1123 1.2802
SD low bound 0.4176 0.2117 0.4746 0.4729
Mean up bound 2.9791 1.4537 2.9444 3.1959
Median up bound 2.9318 1.4259 2.8948 3.1358
SD up bound 0.5684 0.3009 0.5725 0.6605
Mean Width 1.8697 0.9566 1.8076 1.8898
Median Width 1.8474 0.9421 1.7664 1.8365
SD Width 0.1518 0.1004 0.3224 0.3635
Empirical power 0.9987 0.9981 0.9962 0.9992
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !17 52
Table 4.8: NC and BS BCa CIs under Normal distributions (n1=20, n2=10, σ1=2, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9808 0.9497 0.9563 0.9283
Mean low bound -0.7643 -0.3614 -0.6538 -0.6403
Median low bound -0.7591 -0.3578 -0.6447 -0.6273
SD low bound 0.3264 0.1881 0.3315 0.3709
Mean up bound 0.7660 0.3626 0.6555 0.6422
Median up bound 0.7591 0.3578 0.6472 0.6290
SD up bound 0.3264 0.1881 0.3311 0.3704
Mean Width 1.5303 0.7240 1.3093 1.2825
Median Width 1.5260 0.7193 1.2977 1.2687
SD Width 0.0141 0.0104 0.1102 0.1095
Empirical power 0.9813 0.9508 0.9563 0.9283
% of Coverage 0.9815 0.9517 0.9569 0.9291
Mean low bound -0.5609 -0.2444 -0.4514 -0.4141
Median low bound -0.5616 -0.2428 -0.4482 -0.4063
SD low bound 0.3204 0.1835 0.3240 0.3583
Mean up bound 0.9732 0.4824 0.8615 0.8736
Median up bound 0.9644 0.4765 0.8470 0.8549
SD up bound 0.3329 0.1927 0.3406 0.3841
Mean Width 1.5340 0.7268 1.3129 1.2877
Median Width 1.5260 0.7212 1.3010 1.2729
SD Width 0.0189 0.0139 0.1154 0.1173
Empirical power 0.0410 0.0915 0.0823 0.1250
% of Coverage 0.9798 0.9509 0.9516 0.9238
Mean low bound -0.2636 -0.0742 -0.1544 -0.0867
Median low bound -0.2711 -0.0767 -0.1636 -0.0885
SD low bound 0.3202 0.1813 0.3260 0.3514
Mean up bound 1.2901 0.6671 1.1776 1.2287
Median up bound 1.2742 0.6591 1.1555 1.2015
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !18 52
SD up bound 0.3517 0.2044 0.3654 0.4142
Mean Width 1.5536 0.7413 1.3320 1.3154
Median Width 1.5453 0.7339 1.3159 1.2931
SD Width 0.0346 0.0253 0.1398 0.1512
Empirical power 0.1948 0.3324 0.3042 0.3974
% of Coverage 0.9771 0.9516 0.9408 0.9134
Mean low bound 0.2165 0.1977 0.3312 0.4348
Median low bound 0.2014 0.1917 0.3125 0.4201
SD low bound 0.3312 0.1831 0.3452 0.3560
Mean up bound 1.8382 0.9890 1.7292 1.8474
Median up bound 1.8087 0.9749 1.6974 1.8075
SD up bound 0.3969 0.2295 0.4203 0.4791
Mean Width 1.6216 0.7913 1.3980 1.4126
Median Width 1.6073 0.7814 1.3737 1.3767
SD Width 0.0674 0.0480 0.2011 0.2292
Empirical power 0.7362 0.8639 0.8359 0.8984
% of Coverage 0.9669 0.9526 0.9210 0.8914
Mean low bound 1.1204 0.7047 1.2546 1.3994
Median low bound 1.0883 0.6901 1.2254 1.3673
SD low bound 0.3846 0.2062 0.4137 0.4121
Mean up bound 2.9920 1.6719 2.8967 3.1535
Median up bound 2.9396 1.6450 2.8436 3.0878
SD up bound 0.5254 0.2976 0.5589 0.6425
Mean Width 1.8716 0.9673 1.6421 1.7541
Median Width 1.8474 0.9549 1.5957 1.6927
SD Width 0.1415 0.0939 0.3295 0.3847
Empirical power 0.9999 1.0000 0.9999 1.0000
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !19 52
Table 4.9: NC and BS BCa CIs under Normal distributions (n1=75, n2=50, σ1=σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9493 0.9490 0.9507 0.9434
Mean low bound -0.3585 -0.1757 -0.3595 -0.3574
Median low bound -0.3578 -0.1753 -0.3587 -0.3563
SD low bound 0.1837 0.0901 0.1837 0.1886
Mean up bound 0.3598 0.1764 0.3609 0.3588
Median up bound 0.3597 0.1762 0.3603 0.3577
SD up bound 0.1837 0.0901 0.1838 0.1888
Mean Width 0.7183 0.3520 0.7204 0.7162
Median Width 0.7175 0.3515 0.7194 0.7156
SD Width 0.0020 0.0012 0.0342 0.0299
Empirical power 0.9504 0.9504 0.9507 0.9434
% of Coverage 0.9487 0.9492 0.9494 0.9421
Mean low bound -0.1588 -0.0779 -0.1613 -0.1534
Median low bound -0.1588 -0.0778 -0.1612 -0.1527
SD low bound 0.1833 0.0898 0.1833 0.1880
Mean up bound 0.5611 0.2751 0.5606 0.5644
Median up bound 0.5605 0.2746 0.5598 0.5631
SD up bound 0.1862 0.0915 0.1861 0.1914
Mean Width 0.7199 0.3530 0.7219 0.7178
Median Width 0.7193 0.3524 0.7208 0.7169
SD Width 0.0036 0.0021 0.0351 0.0310
Empirical power 0.1899 0.1898 0.1879 0.2064
% of Coverage 0.9503 0.9499 0.9501 0.9432
Mean low bound 0.1373 0.0669 0.1324 0.1487
Median low bound 0.1369 0.0662 0.1308 0.1474
SD low bound 0.1828 0.0896 0.1834 0.1875
Mean up bound 0.8655 0.4248 0.8627 0.8753
Median up bound 0.8618 0.4231 0.8593 0.8719
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !20 52
SD up bound 0.1905 0.0940 0.1905 0.1961
Mean Width 0.7283 0.3578 0.7303 0.7266
Median Width 0.7266 0.3569 0.7284 0.7247
SD Width 0.0081 0.0047 0.0396 0.0362
Empirical power 0.7704 0.7696 0.7647 0.7858
% of Coverage 0.9487 0.9488 0.9471 0.9394
Mean low bound 0.6260 0.3052 0.6194 0.6484
Median low bound 0.6226 0.3032 0.6161 0.6452
SD low bound 0.1873 0.0922 0.1895 0.1929
Mean up bound 1.3860 0.6812 1.3788 1.4064
Median up bound 1.3821 0.6789 1.3740 1.4016
SD up bound 0.2036 0.1013 0.2038 0.2099
Mean Width 0.7600 0.3760 0.7594 0.7580
Median Width 0.7577 0.3748 0.7560 0.7546
SD Width 0.0164 0.0094 0.0524 0.0502
Empirical power 0.9997 0.9998 0.9997 0.9998
% of Coverage 0.9513 0.9506 0.9443 0.9362
Mean low bound 1.5714 0.7645 1.5647 1.6150
Median low bound 1.5647 0.7603 1.5575 1.6072
SD low bound 0.2083 0.1046 0.2136 0.2154
Mean up bound 2.4469 1.2058 2.4299 2.4866
Median up bound 2.4374 1.2003 2.4199 2.4766
SD up bound 0.2406 0.1216 0.2418 0.2495
Mean Width 0.8755 0.4413 0.8652 0.8715
Median Width 0.8727 0.4401 0.8589 0.8651
SD Width 0.0324 0.0184 0.0829 0.0817
Empirical power 1.0000 1.0000 1.0000 1.0000
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !21 52
Table 4.10: NC and BS BCa CIs under Normal distributions (n1=75, n2=50, σ1=2, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9722 0.9498 0.9520 0.9444
Mean low bound -0.3587 -0.1758 -0.3201 -0.3183
Median low bound -0.3578 -0.1753 -0.3191 -0.3169
SD low bound 0.1635 0.0903 0.1636 0.1684
Mean up bound 0.3593 0.1761 0.3208 0.3189
Median up bound 0.3597 0.1762 0.3203 0.3182
SD up bound 0.1635 0.0903 0.1636 0.1684
Mean Width 0.7180 0.3519 0.6409 0.6372
Median Width 0.7175 0.3515 0.6400 0.6365
SD Width 0.0017 0.0011 0.0281 0.0237
Empirical power 0.9729 0.9512 0.9520 0.9444
% of Coverage 0.9715 0.9490 0.9507 0.9434
Mean low bound -0.1588 -0.0654 -0.1218 -0.1137
Median low bound -0.1588 -0.0662 -0.1223 -0.1138
SD low bound 0.1623 0.0894 0.1615 0.1661
Mean up bound 0.5608 0.2876 0.5215 0.5261
Median up bound 0.5587 0.2862 0.5192 0.5231
SD up bound 0.1649 0.0912 0.1659 0.1711
Mean Width 0.7196 0.3530 0.6432 0.6399
Median Width 0.7193 0.3524 0.6421 0.6388
SD Width 0.0031 0.0021 0.0298 0.0259
Empirical power 0.1598 0.2281 0.2235 0.2464
% of Coverage 0.9711 0.9510 0.9511 0.9434
Mean low bound 0.1398 0.0989 0.1731 0.1901
Median low bound 0.1388 0.0984 0.1712 0.1882
SD low bound 0.1632 0.0894 0.1623 0.1660
Mean up bound 0.8679 0.4577 0.8280 0.8425
Median up bound 0.8636 0.4562 0.8243 0.8387
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !22 52
SD up bound 0.1702 0.0940 0.1722 0.1778
Mean Width 0.7281 0.3588 0.6548 0.6524
Median Width 0.7266 0.3578 0.6527 0.6501
SD Width 0.0073 0.0048 0.0371 0.0348
Empirical power 0.8028 0.8661 0.8594 0.8754
% of Coverage 0.9668 0.9492 0.9464 0.9376
Mean low bound 0.6267 0.3662 0.6536 0.6838
Median low bound 0.6226 0.3640 0.6486 0.6790
SD low bound 0.1715 0.0928 0.1715 0.1739
Mean up bound 1.3865 0.7459 1.3484 1.3792
Median up bound 1.3803 0.7433 1.3423 1.3728
SD up bound 0.1864 0.1023 0.1903 0.1964
Mean Width 0.7598 0.3798 0.6948 0.6955
Median Width 0.7577 0.3792 0.6903 0.6907
SD Width 0.0151 0.0097 0.0556 0.0547
Empirical power 1.0000 1.0000 1.0000 1.0000
% of Coverage 0.9557 0.9499 0.9404 0.9320
Mean low bound 1.5762 0.8856 1.5863 1.6387
Median low bound 1.5683 0.8819 1.5783 1.6303
SD low bound 0.2033 0.1073 0.2066 0.2066
Mean up bound 2.4524 1.3400 2.4240 2.4867
Median up bound 2.4410 1.3345 2.4136 2.4754
SD up bound 0.2349 0.1259 0.2409 0.2493
Mean Width 0.8762 0.4544 0.8376 0.8480
Median Width 0.8745 0.4535 0.8281 0.8388
SD Width 0.0317 0.0191 0.0955 0.0953
Empirical power 1.0000 1.0000 1.0000 1.0000
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !23 52
Table 4.11: NC and BS BCa CIs under Skew-normal distributions (assume n1=10, n2=10, σ1=1, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9480 0.9497 0.9589 0.9168
Mean low bound -0.8678 -0.4347 -0.9279 -0.8583
Median low bound -0.8676 -0.4338 -0.9343 -0.8493
SD low bound 0.4672 0.2330 0.4452 0.5305
Mean up bound 0.9146 0.4580 1.0003 1.0209
Median up bound 0.8855 0.4427 0.9747 0.9760
SD up bound 0.4752 0.2377 0.5054 0.6192
Mean Width 1.7825 0.8927 1.9283 1.8792
Median Width 1.7665 0.8832 1.9091 1.8418
SD Width 0.0409 0.0232 0.1600 0.1923
Empirical power 0.9491 0.9508 0.9589 0.9168
% of Coverage 0.9446 0.9464 0.9560 0.9129
Mean low bound -0.6654 -0.3338 -0.7384 -0.6345
Median low bound -0.6753 -0.3376 -0.7586 -0.6371
SD low bound 0.4690 0.2332 0.4552 0.5304
Mean up bound 1.1246 0.5631 1.2200 1.2863
Median up bound 1.0867 0.5434 1.1831 1.2293
SD up bound 0.5024 0.2520 0.5310 0.6566
Mean Width 1.7900 0.8969 1.9584 1.9207
Median Width 1.7710 0.8855 1.9284 1.8653
SD Width 0.0533 0.0302 0.1832 0.2299
Empirical power 0.0821 0.0800 0.0616 0.1199
% of Coverage 0.9374 0.9401 0.9480 0.9031
Mean low bound -0.3686 -0.1864 -0.4526 -0.3042
Median low bound -0.3846 -0.1923 -0.4889 -0.3237
SD low bound 0.4747 0.2351 0.4802 0.5366
Mean up bound 1.4514 0.7271 1.5566 1.6929
Median up bound 1.4043 0.7021 1.5045 1.6209
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !24 52
SD up bound 0.5486 0.2762 0.5763 0.7175
Mean Width 1.8200 0.9135 2.0092 1.9970
Median Width 1.7889 0.8967 1.9635 1.9146
SD Width 0.0856 0.0482 0.2330 0.3020
Empirical power 0.2064 0.2025 0.1605 0.2686
% of Coverage 0.9298 0.9334 0.9330 0.8862
Mean low bound 0.1032 0.0470 0.0217 0.2239
Median low bound 0.0760 0.0358 -0.0338 0.1850
SD low bound 0.4883 0.2406 0.5340 0.5588
Mean up bound 2.0195 1.0133 2.1367 2.3939
Median up bound 1.9543 0.9794 2.0645 2.2961
SD up bound 0.6306 0.3189 0.6604 0.8264
Mean Width 1.9164 0.9663 2.1150 2.1700
Median Width 1.8738 0.9436 2.0492 2.0629
SD Width 0.1488 0.0830 0.3355 0.4402
Empirical power 0.5625 0.5572 0.4739 0.6409
% of Coverage 0.9184 0.9251 0.9019 0.8534
Mean low bound 0.9797 0.4781 0.9698 1.2160
Median low bound 0.9302 0.4562 0.9022 1.1516
SD low bound 0.5500 0.2695 0.6597 0.6488
Mean up bound 3.2353 1.6284 3.3539 3.8576
Median up bound 3.1305 1.5742 3.2429 3.7112
SD up bound 0.8365 0.4247 0.8690 1.0911
Mean Width 2.2556 1.1503 2.3841 2.6416
Median Width 2.2003 1.1180 2.2889 2.4952
SD Width 0.2891 0.1596 0.5421 0.7112
Empirical power 0.9798 0.9788 0.9531 0.9885
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !25 52
Table 4.12: NC and BS BCa CIs under Skew-normal distributions (assume n1=10, n2=10, σ1=2, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9442 0.9489 0.9568 0.9117
Mean low bound -0.8870 -0.4458 -0.9752 -0.9352
Median low bound -0.8810 -0.4405 -0.9732 -0.9116
SD low bound 0.4790 0.2379 0.4687 0.5848
Mean up bound 0.8963 0.4503 0.9986 0.9921
Median up bound 0.8765 0.4383 0.9837 0.9472
SD up bound 0.4818 0.2410 0.5006 0.6334
Mean Width 1.7834 0.8961 1.9739 1.9273
Median Width 1.7665 0.8855 1.9523 1.8863
SD Width 0.0430 0.0278 0.1842 0.2209
Empirical power 0.9454 0.9502 0.9568 0.9117
% of Coverage 0.9444 0.9494 0.9571 0.9113
Mean low bound -0.6780 -0.3423 -0.7737 -0.6881
Median low bound -0.6798 -0.3399 -0.7869 -0.6780
SD low bound 0.4685 0.2309 0.4674 0.5687
Mean up bound 1.1110 0.5578 1.2126 1.2592
Median up bound 1.0823 0.5411 1.1769 1.1985
SD up bound 0.4976 0.2508 0.5162 0.6635
Mean Width 1.7890 0.9001 1.9863 1.9473
Median Width 1.7710 0.8877 1.9584 1.8948
SD Width 0.0513 0.0340 0.2015 0.2505
Empirical power 0.0773 0.0710 0.0587 0.1166
% of Coverage 0.9398 0.9460 0.9497 0.9016
Mean low bound -0.3771 -0.1949 -0.4754 -0.3299
Median low bound -0.3846 -0.1968 -0.5105 -0.3355
SD low bound 0.4680 0.2280 0.4867 0.5611
Mean up bound 1.4408 0.7246 1.5407 1.6732
Median up bound 1.3998 0.7021 1.4796 1.5923
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !26 52
SD up bound 0.5376 0.2737 0.5651 0.7318
Mean Width 1.8180 0.9195 2.0160 2.0031
Median Width 1.7889 0.9011 1.9736 1.9199
SD Width 0.0821 0.0538 0.2570 0.3316
Empirical power 0.1969 0.1833 0.1526 0.2644
% of Coverage 0.9369 0.9453 0.9358 0.8856
Mean low bound 0.0976 0.0344 0.0199 0.2305
Median low bound 0.0716 0.0224 -0.0365 0.1982
SD low bound 0.4753 0.2284 0.5374 0.5627
Mean up bound 2.0108 1.0165 2.1090 2.3837
Median up bound 1.9454 0.9816 2.0241 2.2712
SD up bound 0.6138 0.3162 0.6595 0.8494
Mean Width 1.9132 0.9820 2.0892 2.1532
Median Width 1.8738 0.9593 2.0236 2.0326
SD Width 0.1448 0.0930 0.3697 0.4847
Empirical power 0.5595 0.5383 0.4723 0.6494
% of Coverage 0.9271 0.9409 0.8983 0.8522
Mean low bound 0.9819 0.4535 1.0059 1.2534
Median low bound 0.9302 0.4271 0.9482 1.1927
SD low bound 0.5362 0.2573 0.6451 0.6275
Mean up bound 3.2368 1.6526 3.3388 3.9029
Median up bound 3.1260 1.5966 3.2111 3.7354
SD up bound 0.8178 0.4223 0.8947 1.1412
Mean Width 2.2549 1.1991 2.3329 2.6495
Median Width 2.1958 1.1672 2.2305 2.4869
SD Width 0.2840 0.1728 0.5991 0.7876
Empirical power 0.9846 0.9816 0.9619 0.9923
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !27 52
Table 4.13: NC and BS BCa CIs under Skew-normal distributions (assume n1=20, n2=10, σ1=1, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9491 0.9458 0.9491 0.9197
Mean low bound -0.7537 -0.3503 -0.7980 -0.7380
Median low bound -0.7514 -0.3542 -0.8063 -0.7343
SD low bound 0.3981 0.1922 0.4183 0.4556
Mean up bound 0.7812 0.3767 0.7878 0.8109
Median up bound 0.7707 0.3633 0.7767 0.7955
SD up bound 0.3998 0.1938 0.3977 0.4511
Mean Width 1.5349 0.7270 1.5857 1.5489
Median Width 1.5260 0.7212 1.5725 1.5349
SD Width 0.0207 0.0148 0.1836 0.1776
Empirical power 0.9502 0.9470 0.9491 0.9197
% of Coverage 0.9460 0.9391 0.9465 0.9168
Mean low bound -0.5528 -0.2547 -0.6060 -0.5244
Median low bound -0.5538 -0.2611 -0.6161 -0.5260
SD low bound 0.3983 0.1945 0.4255 0.4556
Mean up bound 0.9867 0.4756 1.0007 1.0481
Median up bound 0.9682 0.4583 0.9819 1.0262
SD up bound 0.4154 0.2069 0.4136 0.4709
Mean Width 1.5395 0.7302 1.6066 1.5724
Median Width 1.5298 0.7230 1.5908 1.5543
SD Width 0.0272 0.0192 0.1985 0.1946
Empirical power 0.0852 0.0983 0.0844 0.1304
% of Coverage 0.9407 0.9267 0.9424 0.9116
Mean low bound -0.2557 -0.1150 -0.3196 -0.2096
Median low bound -0.2595 -0.1260 -0.3352 -0.2161
SD low bound 0.4031 0.2004 0.4418 0.4604
Mean up bound 1.3043 0.6298 1.3306 1.4168
Median up bound 1.2820 0.6080 1.3071 1.3875
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !28 52
SD up bound 0.4439 0.2291 0.4439 0.5079
Mean Width 1.5601 0.7448 1.6502 1.6263
Median Width 1.5453 0.7339 1.6275 1.6001
SD Width 0.0464 0.0325 0.2268 0.2290
Empirical power 0.2538 0.2625 0.2279 0.3163
% of Coverage 0.9330 0.9094 0.9317 0.9003
Mean low bound 0.2208 0.1052 0.1495 0.2976
Median low bound 0.2091 0.0895 0.1297 0.2824
SD low bound 0.4176 0.2154 0.4765 0.4776
Mean up bound 1.8496 0.8981 1.8910 2.0447
Median up bound 1.8164 0.8709 1.8552 2.0012
SD up bound 0.5001 0.2704 0.5028 0.5774
Mean Width 1.6288 0.7929 1.7415 1.7471
Median Width 1.6073 0.7796 1.7075 1.7060
SD Width 0.0857 0.0585 0.2815 0.2973
Empirical power 0.6958 0.6693 0.6111 0.7323
% of Coverage 0.9223 0.8907 0.9165 0.8828
Mean low bound 1.1208 0.5126 1.0740 1.2659
Median low bound 1.0922 0.4875 1.0516 1.2367
SD low bound 0.4665 0.2596 0.5538 0.5371
Mean up bound 2.9978 1.4732 3.0502 3.3419
Median up bound 2.9435 1.4350 2.9914 3.2694
SD up bound 0.6361 0.3617 0.6399 0.7391
Mean Width 1.8770 0.9605 1.9762 2.0759
Median Width 1.8513 0.9439 1.9194 2.0063
SD Width 0.1708 0.1112 0.3906 0.4398
Empirical power 0.9966 0.9921 0.9808 0.9962
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !29 52
Table 4.14: NC and BS BCa CIs under Skew-normal distributions (assume n1=20, n2=10, σ1=2, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9816 0.9502 0.9570 0.9290
Mean low bound -0.7620 -0.3573 -0.6605 -0.6329
Median low bound -0.7591 -0.3578 -0.6530 -0.6212
SD low bound 0.3259 0.1878 0.3361 0.3727
Mean up bound 0.7682 0.3668 0.6482 0.6482
Median up bound 0.7630 0.3597 0.6391 0.6340
SD up bound 0.3262 0.1888 0.3265 0.3680
Mean Width 1.5303 0.7241 1.3087 1.2811
Median Width 1.5260 0.7193 1.2970 1.2675
SD Width 0.0142 0.0107 0.1119 0.1108
Empirical power 0.9821 0.9513 0.9570 0.9290
% of Coverage 0.9796 0.9467 0.9552 0.9264
Mean low bound -0.5580 -0.2397 -0.4608 -0.4100
Median low bound -0.5577 -0.2410 -0.4600 -0.4039
SD low bound 0.3239 0.1874 0.3348 0.3666
Mean up bound 0.9764 0.4877 0.8589 0.8846
Median up bound 0.9644 0.4783 0.8421 0.8640
SD up bound 0.3370 0.1980 0.3409 0.3875
Mean Width 1.5343 0.7274 1.3198 1.2946
Median Width 1.5260 0.7212 1.3060 1.2780
SD Width 0.0195 0.0153 0.1228 0.1254
Empirical power 0.0431 0.1012 0.0836 0.1324
% of Coverage 0.9779 0.9429 0.9509 0.9223
Mean low bound -0.2612 -0.0699 -0.1675 -0.0873
Median low bound -0.2672 -0.0749 -0.1774 -0.0920
SD low bound 0.3247 0.1884 0.3384 0.3609
Mean up bound 1.2930 0.6727 1.1811 1.2459
Median up bound 1.2742 0.6591 1.1557 1.2147
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !30 52
SD up bound 0.3572 0.2137 0.3691 0.4216
Mean Width 1.5541 0.7425 1.3486 1.3332
Median Width 1.5453 0.7339 1.3307 1.3089
SD Width 0.0357 0.0278 0.1517 0.1642
Empirical power 0.1999 0.3405 0.2975 0.3966
% of Coverage 0.9729 0.9360 0.9385 0.9097
Mean low bound 0.2175 0.2014 0.3136 0.4268
Median low bound 0.2014 0.1935 0.2943 0.4128
SD low bound 0.3395 0.1964 0.3615 0.3704
Mean up bound 1.8399 0.9950 1.7393 1.8702
Median up bound 1.8087 0.9768 1.7022 1.8239
SD up bound 0.4069 0.2472 0.4309 0.4942
Mean Width 1.6224 0.7936 1.4257 1.4434
Median Width 1.6073 0.7832 1.4001 1.4055
SD Width 0.0693 0.0525 0.2133 0.2433
Empirical power 0.7318 0.8501 0.8093 0.8833
% of Coverage 0.9616 0.9324 0.9211 0.8916
Mean low bound 1.1246 0.7097 1.2363 1.3859
Median low bound 1.0922 0.6938 1.2052 1.3524
SD low bound 0.3974 0.2271 0.4287 0.4266
Mean up bound 2.9983 1.6816 2.9295 3.1979
Median up bound 2.9435 1.6505 2.8696 3.1232
SD up bound 0.5435 0.3279 0.5841 0.6757
Mean Width 1.8737 0.9719 1.6931 1.8120
Median Width 1.8513 0.9567 1.6454 1.7477
SD Width 0.1469 0.1033 0.3438 0.4057
Empirical power 0.9998 0.9999 0.9998 1.0000
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !31 52
Table 4.15: NC and BS BCa CIs under Skew-normal distributions (assume n1=75, n2=50, σ1=1, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9496 0.9491 0.9506 0.9433
Mean low bound -0.3556 -0.1735 -0.3584 -0.3473
Median low bound -0.3560 -0.1744 -0.3596 -0.3485
SD low bound 0.1840 0.0904 0.1813 0.1861
Mean up bound 0.3627 0.1785 0.3621 0.3693
Median up bound 0.3597 0.1762 0.3598 0.3664
SD up bound 0.1841 0.0904 0.1865 0.1922
Mean Width 0.7183 0.3520 0.7205 0.7167
Median Width 0.7175 0.3515 0.7190 0.7153
SD Width 0.0021 0.0012 0.0353 0.0311
Empirical power 0.9507 0.9503 0.9506 0.9433
% of Coverage 0.9466 0.9440 0.9504 0.9432
Mean low bound -0.1560 -0.0756 -0.1665 -0.1497
Median low bound -0.1570 -0.0778 -0.1689 -0.1515
SD low bound 0.1859 0.0921 0.1835 0.1879
Mean up bound 0.5640 0.2773 0.5683 0.5815
Median up bound 0.5605 0.2746 0.5652 0.5778
SD up bound 0.1890 0.0938 0.1914 0.1973
Mean Width 0.7200 0.3530 0.7347 0.7312
Median Width 0.7193 0.3524 0.7326 0.7290
SD Width 0.0038 0.0021 0.0393 0.0358
Empirical power 0.1977 0.2012 0.1813 0.2114
% of Coverage 0.9392 0.9330 0.9488 0.9407
Mean low bound 0.1413 0.0699 0.1195 0.1448
Median low bound 0.1388 0.0680 0.1161 0.1419
SD low bound 0.1914 0.0961 0.1901 0.1939
Mean up bound 0.8700 0.4279 0.8809 0.9039
Median up bound 0.8654 0.4249 0.8765 0.8987
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !32 52
SD up bound 0.1996 0.1008 0.2020 0.2084
Mean Width 0.7286 0.3580 0.7614 0.7591
Median Width 0.7266 0.3569 0.7578 0.7555
SD Width 0.0086 0.0049 0.0483 0.0456
Empirical power 0.7656 0.7609 0.7309 0.7715
% of Coverage 0.9299 0.9188 0.9457 0.9373
Mean low bound 0.6283 0.3075 0.5903 0.6287
Median low bound 0.6244 0.3041 0.5852 0.6240
SD low bound 0.2027 0.1041 0.2050 0.2072
Mean up bound 1.3887 0.6836 1.4095 1.4486
Median up bound 1.3821 0.6798 1.4034 1.4415
SD up bound 0.2204 0.1138 0.2227 0.2297
Mean Width 0.7604 0.3762 0.8193 0.8199
Median Width 0.7577 0.3748 0.8132 0.8139
SD Width 0.0179 0.0100 0.0667 0.0650
Empirical power 0.9994 0.9993 0.9990 0.9993
% of Coverage 0.9177 0.8998 0.9402 0.9319
Mean low bound 1.5762 0.7683 1.5141 1.5762
Median low bound 1.5683 0.7629 1.5067 1.5684
SD low bound 0.2363 0.1260 0.2469 0.2455
Mean up bound 2.4527 1.2101 2.4870 2.5579
Median up bound 2.4428 1.2039 2.4758 2.5463
SD up bound 0.2730 0.1450 0.2742 0.2832
Mean Width 0.8766 0.4418 0.9728 0.9817
Median Width 0.8745 0.4401 0.9615 0.9707
SD Width 0.0368 0.0201 0.1069 0.1057
Empirical power 1.0000 1.0000 1.0000 1.0000
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !33 52
Table 4.16: NC and BS BCa CIs under Skew-normal distributions (assume n1=75, n2=50, σ1=2, σ2=1)
True parameter Statistic NC(Shieh’s d)
% of Coverage 0.9718 0.9501 0.9520 0.9442
Mean low bound -0.3586 -0.1754 -0.3211 -0.3167
Median low bound -0.3597 -0.1762 -0.3207 -0.3164
SD low bound 0.1632 0.0901 0.1631 0.1678
Mean up bound 0.3594 0.1765 0.3197 0.3204
Median up bound 0.3578 0.1753 0.3184 0.3188
SD up bound 0.1632 0.0902 0.1634 0.1684
Mean Width 0.7180 0.3519 0.6408 0.6371
Median Width 0.7175 0.3515 0.6398 0.6364
SD Width 0.0017 0.0011 0.0284 0.0239
Empirical power 0.9724 0.9511 0.9520 0.9442
% of Coverage 0.9708 0.9479 0.9517 0.9444
Mean low bound -0.1581 -0.0647 -0.1239 -0.1132
Median low bound -0.1588 -0.0653 -0.1243 -0.1140
SD low bound 0.1634 0.0905 0.1626 0.1671
Mean up bound 0.5615 0.2883 0.5226 0.5298
Median up bound 0.5587 0.2871 0.5199 0.5267
SD up bound 0.1661 0.0923 0.1671 0.1724
Mean Width 0.7196 0.3531 0.6465 0.6430
Median Width 0.7193 0.3524 0.6451 0.6418
SD Width 0.0031 0.0022 0.0310 0.0270
Empirical power 0.1634 0.2338 0.2220 0.2492
% of Coverage 0.9685 0.9450 0.9507 0.9427
Mean low bound 0.1399 0.0993 0.1682 0.1877
Median low bound 0.1388 0.0984 0.1663 0.1858
SD low bound 0.1657 0.0917 0.1645 0.1684
Mean up bound 0.8680 0.4582 0.8314 0.8484
Median up bound 0.8636 0.4562 0.8274 0.8438
True parameter Statistic
"δ = 0
BS BCa (" )ds
"δ = 0.5
"δ = 0.2
NC(" )ds BS BCa (" )gs
Page ! of !34 52
SD up bound 0.1728 0.0967 0.1753 0.1811
Mean Width 0.7281 0.3589 0.6631 0.6608
Median Width 0.7266 0.3578 0.6607 0.6580
SD Width 0.0074 0.0051 0.0392 0.0368
Empirical power 0.7976 0.8598 0.8486 0.8692
% of Coverage 0.9637 0.9410 0.9478 0.9395
Mean low bound 0.6273 0.3669 0.6462 0.6782
Median low bound 0.6226 0.3649 0.6416 0.6742
SD low bound 0.1747 0.0964 0.1740 0.1768
Mean up bound 1.3872 0.7469 1.3570 1.3900
Median up bound 1.3821 0.7442 1.3508 1.3832
SD up bound 0.1900 0.1067 0.1950 0.2016
Mean Width 0.7599 0.3801 0.7107 0.7118
Median Width 0.7577 0.3792 0.7060 0.7070
SD Width 0.0154 0.0105 0.0580 0.0572
Empirical power 0.9999 1.0000 1.0000 1.0000
% of Coverage 0.9514 0.9363 0.9420 0.9348
Mean low bound 1.5763 0.8860 1.5750 1.6278
Median low bound 1.5683 0.8819 1.5668 1.6192
SD low bound 0.2088 0.1136 0.2104 0.2110
Mean up bound 2.4526 1.3410 2.4393 2.5034
Median up bound 2.4410 1.3354 2.4274 2.4912
SD up bound 0.2413 0.1343 0.2495 0.2583
Mean Width 0.8762 0.4549 0.8643 0.8756
Median Width 0.8745 0.4535 0.8547 0.8662
SD Width 0.0326 0.0213 0.0972 0.0977
Empirical power 1.0000 1.0000 1.0000 1.0000
NC(Shieh’s d)True parameter Statistic
"δ = 2
"δ = 1
NC(" )ds BS BCa (" )gsBS BCa (" )ds
Page ! of !35 52
Table 4.17: Type I error rates and empirical powers in Student’s t-test and Welch’s t-test for two normally distributed groups
Power Type I error rate
Student t-test
Welch t-test
Student t-test
Welch t-test
10 10 1.0 1 0.1 0 0.0555 0.0539 0.0500 0.0486
10 10 1.0 1 0.5 0 0.1850 0.1814 0.0503 0.0489
10 10 1.0 1 0.8 0 0.3961 0.3907 0.0502 0.0487
10 10 1.0 1 1.5 0 0.8873 0.8843 0.0496 0.0482
10 10 1.0 1 2.0 0 0.9882 0.9877 0.0499 0.0485
10 10 1.0 1 5.0 0 1.0000 1.0000 0.0500 0.0486
10 10 0.5 1 0.1 0 0.0630 0.0580 0.0546 0.0500
10 10 0.5 1 0.5 0 0.2746 0.2591 0.0549 0.0503
10 10 0.5 1 0.8 0 0.5747 0.5530 0.0545 0.0501
10 10 0.5 1 1.5 0 0.9782 0.9738 0.0546 0.0500
10 10 0.5 1 2.0 0 0.9995 0.9993 0.0546 0.0500
10 10 0.5 1 5.0 0 1.0000 1.0000 0.0546 0.0500
10 10 1.5 1 0.1 0 0.0549 0.0522 0.0516 0.0490
10 10 1.5 1 0.5 0 0.1348 0.1293 0.0519 0.0493
10 10 1.5 1 0.8 0 0.2679 0.2590 0.0518 0.0492
10 10 1.5 1 1.5 0 0.7013 0.6899 0.0516 0.0491
10 10 1.5 1 2.0 0 0.9116 0.9055 0.0520 0.0494
10 10 1.5 1 5.0 0 1.0000 1.0000 0.0526 0.0499
15 10 1.0 1 0.1 0 0.0563 0.0555 0.0497 0.0492
15 10 1.0 1 0.5 0 0.2173 0.2128 0.0501 0.0498
15 10 1.0 1 0.8 0 0.4667 0.4581 0.0500 0.0494
15 10 1.0 1 1.5 0 0.9404 0.9351 0.0500 0.0497
15 10 1.0 1 2.0 0 0.9968 0.9961 0.0502 0.0497
15 10 1.0 1 5.0 0 1.0000 1.0000 0.0497 0.0492
15 10 0.5 1 0.1 0 0.0990 0.0595 0.0870 0.0508
15 10 0.5 1 0.5 0 0.3779 0.2730 0.0862 0.0510
15 10 0.5 1 0.8 0 0.6998 0.5771 0.0866 0.0507
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !36 52
15 10 0.5 1 1.5 0 0.9930 0.9792 0.0867 0.0511
15 10 0.5 1 2.0 0 0.9999 0.9996 0.0867 0.0511
15 10 0.5 1 5.0 0 1.0000 1.0000 0.0866 0.0508
15 10 1.5 1 0.1 0 0.0395 0.0533 0.0359 0.0490
15 10 1.5 1 0.5 0 0.1267 0.1581 0.0364 0.0496
15 10 1.5 1 0.8 0 0.2813 0.3320 0.0359 0.0489
15 10 1.5 1 1.5 0 0.7710 0.8165 0.0358 0.0488
15 10 1.5 1 2.0 0 0.9537 0.9682 0.0360 0.0492
15 10 1.5 1 5.0 0 1.0000 1.0000 0.0360 0.0492
20 10 1.0 1 0.1 0 0.0573 0.0573 0.0498 0.0498
20 10 1.0 1 0.5 0 0.2384 0.2314 0.0501 0.0502
20 10 1.0 1 0.8 0 0.5144 0.4976 0.0502 0.0503
20 10 1.0 1 1.5 0 0.9624 0.9543 0.0501 0.0502
20 10 1.0 1 2.0 0 0.9988 0.9980 0.0501 0.0502
20 10 1.0 1 5.0 0 1.0000 1.0000 0.0502 0.0506
20 10 0.5 1 0.1 0 0.1291 0.0599 0.1135 0.0511
20 10 0.5 1 0.5 0 0.4457 0.2790 0.1138 0.0517
20 10 0.5 1 0.8 0 0.7651 0.5870 0.1139 0.0512
20 10 0.5 1 1.5 0 0.9968 0.9815 0.1132 0.0510
20 10 0.5 1 2.0 0 1.0000 0.9997 0.1137 0.0510
20 10 0.5 1 5.0 0 1.0000 1.0000 0.1138 0.0514
20 10 1.5 1 0.1 0 0.0306 0.0544 0.0271 0.0493
20 10 1.5 1 0.5 0 0.1206 0.1792 0.0274 0.0492
20 10 1.5 1 0.8 0 0.2906 0.3838 0.0273 0.0496
20 10 1.5 1 1.5 0 0.8132 0.8764 0.0273 0.0493
20 10 1.5 1 2.0 0 0.9726 0.9860 0.0271 0.0491
20 10 1.5 1 5.0 0 1.0000 1.0000 0.0277 0.0499
20 20 1.0 1 0.1 0 0.0612 0.0608 0.0502 0.0498
Power Type I error rate
Student t-test
Welch t-test
Student t-test
Welch t-test
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !37 52
20 20 1.0 1 0.5 0 0.3387 0.3376 0.0498 0.0495
20 20 1.0 1 0.8 0 0.6936 0.6925 0.0502 0.0499
20 20 1.0 1 1.5 0 0.9963 0.9962 0.0499 0.0495
20 20 1.0 1 2.0 0 1.0000 1.0000 0.0499 0.0496
20 20 1.0 1 5.0 0 1.0000 1.0000 0.0498 0.0494
20 20 0.5 1 0.1 0 0.0699 0.0669 0.0526 0.0502
20 20 0.5 1 0.5 0 0.4978 0.4885 0.0526 0.0502
20 20 0.5 1 0.8 0 0.8758 0.8703 0.0522 0.0499
20 20 0.5 1 1.5 0 0.9999 0.9999 0.0520 0.0497
20 20 0.5 1 2.0 0 1.0000 1.0000 0.0525 0.0500
20 20 0.5 1 5.0 0 1.0000 1.0000 0.0526 0.0502
20 20 1.5 1 0.1 0 0.0577 0.0564 0.0509 0.0498
20 20 1.5 1 0.5 0 0.2286 0.2254 0.0508 0.0496
20 20 1.5 1 0.8 0 0.4916 0.4869 0.0508 0.0496
20 20 1.5 1 1.5 0 0.9514 0.9500 0.0510 0.0499
20 20 1.5 1 2.0 0 0.9979 0.9978 0.0508 0.0497
20 20 1.5 1 5.0 0 1.0000 1.0000 0.0508 0.0496
30 20 1.0 1 0.1 0 0.0631 0.0629 0.0503 0.0502
30 20 1.0 1 0.5 0 0.3958 0.3929 0.0503 0.0502
30 20 1.0 1 0.8 0 0.7754 0.7717 0.0499 0.0499
30 20 1.0 1 1.5 0 0.9991 0.9990 0.0498 0.0496
30 20 1.0 1 2.0 0 1.0000 1.0000 0.0497 0.0497
30 20 1.0 1 5.0 0 1.0000 1.0000 0.0500 0.0499
30 20 0.5 1 0.1 0 0.1100 0.0685 0.0844 0.0502
30 20 0.5 1 0.5 0 0.6179 0.5130 0.0840 0.0499
30 20 0.5 1 0.8 0 0.9348 0.8891 0.0846 0.0502
30 20 0.5 1 1.5 0 1.0000 1.0000 0.0841 0.0498
30 20 0.5 1 2.0 0 1.0000 1.0000 0.0847 0.0502
Power Type I error rate
Student t-test
Welch t-test
Student t-test
Welch t-test
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !38 52
30 20 0.5 1 5.0 0 1.0000 1.0000 0.0846 0.0499
30 20 1.5 1 0.1 0 0.0425 0.0588 0.0351 0.0494
30 20 1.5 1 0.5 0 0.2336 0.2830 0.0347 0.0490
30 20 1.5 1 0.8 0 0.5394 0.6011 0.0352 0.0498
30 20 1.5 1 1.5 0 0.9788 0.9860 0.0350 0.0494
30 20 1.5 1 2.0 0 0.9997 0.9998 0.0350 0.0497
30 20 1.5 1 5.0 0 1.0000 1.0000 0.0348 0.0493
40 20 1.0 1 0.1 0 0.0651 0.0650 0.0496 0.0496
40 20 1.0 1 0.5 0 0.4344 0.4280 0.0501 0.0502
40 20 1.0 1 0.8 0 0.8185 0.8115 0.0498 0.0501
40 20 1.0 1 1.5 0 0.9997 0.9996 0.0502 0.0500
40 20 1.0 1 2.0 0 1.0000 1.0000 0.0507 0.0507
40 20 1.0 1 5.0 0 1.0000 1.0000 0.0499 0.0502
40 20 0.5 1 0.1 0 0.1425 0.0697 0.1116 0.0506
40 20 0.5 1 0.5 0 0.6859 0.5260 0.1118 0.0502
40 20 0.5 1 0.8 0 0.9579 0.8984 0.1122 0.0505
40 20 0.5 1 1.5 0 1.0000 1.0000 0.1120 0.0501
40 20 0.5 1 2.0 0 1.0000 1.0000 0.1114 0.0503
40 20 0.5 1 5.0 0 1.0000 1.0000 0.1113 0.0501
40 20 1.5 1 0.1 0 0.0333 0.0601 0.0265 0.0497
40 20 1.5 1 0.5 0 0.2353 0.3240 0.0267 0.0500
40 20 1.5 1 0.8 0 0.5711 0.6727 0.0264 0.0495
40 20 1.5 1 1.5 0 0.9885 0.9946 0.0266 0.0498
40 20 1.5 1 2.0 0 0.9999 1.0000 0.0269 0.0503
40 20 1.5 1 5.0 0 1.0000 1.0000 0.0263 0.0498
50 50 1.0 1 0.1 0 0.0784 0.0783 0.0502 0.0501
50 50 1.0 1 0.5 0 0.6974 0.6973 0.0500 0.0500
50 50 1.0 1 0.8 0 0.9771 0.9771 0.0499 0.0499
Power Type I error rate
Student t-test
Welch t-test
Student t-test
Welch t-test
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !39 52
50 50 1.0 1 1.5 0 1.0000 1.0000 0.0499 0.0498
50 50 1.0 1 2.0 0 1.0000 1.0000 0.0501 0.0500
50 50 1.0 1 5.0 0 1.0000 1.0000 0.0504 0.0503
50 50 0.5 1 0.1 0 0.0974 0.0958 0.0512 0.0502
50 50 0.5 1 0.5 0 0.8791 0.8772 0.0507 0.0498
50 50 0.5 1 0.8 0 0.9988 0.9988 0.0509 0.0500
50 50 0.5 1 1.5 0 1.0000 1.0000 0.0512 0.0501
50 50 0.5 1 2.0 0 1.0000 1.0000 0.0511 0.0501
50 50 0.5 1 5.0 0 1.0000 1.0000 0.0516 0.0506
50 50 1.5 1 0.1 0 0.0679 0.0674 0.0502 0.0497
50 50 1.5 1 0.5 0 0.4926 0.4910 0.0504 0.0500
50 50 1.5 1 0.8 0 0.8735 0.8727 0.0503 0.0499
50 50 1.5 1 1.5 0 0.9999 0.9999 0.0505 0.0500
50 50 1.5 1 2.0 0 1.0000 1.0000 0.0504 0.0500
50 50 1.5 1 5.0 0 1.0000 1.0000 0.0508 0.0503
75 50 1.0 1 0.1 0 0.0849 0.0848 0.0501 0.0502
75 50 1.0 1 0.5 0 0.7751 0.7741 0.0502 0.0501
75 50 1.0 1 0.8 0 0.9915 0.9913 0.0499 0.0499
75 50 1.0 1 1.5 0 1.0000 1.0000 0.0498 0.0498
75 50 1.0 1 2.0 0 1.0000 1.0000 0.0498 0.0498
75 50 1.0 1 5.0 0 1.0000 1.0000 0.0498 0.0498
75 50 0.5 1 0.1 0 0.1477 0.0987 0.0828 0.0497
75 50 0.5 1 0.5 0 0.9350 0.8972 0.0836 0.0500
75 50 0.5 1 0.8 0 0.9997 0.9993 0.0832 0.0500
75 50 0.5 1 1.5 0 1.0000 1.0000 0.0834 0.0499
75 50 0.5 1 2.0 0 1.0000 1.0000 0.0838 0.0500
75 50 0.5 1 5.0 0 1.0000 1.0000 0.0833 0.0504
75 50 1.5 1 0.1 0 0.0527 0.0727 0.0347 0.0502
Power Type I error rate
Student t-test
Welch t-test
Student t-test
Welch t-test
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !40 52
75 50 1.5 1 0.5 0 0.5404 0.6019 0.0347 0.0501
75 50 1.5 1 0.8 0 0.9234 0.9440 0.0348 0.0500
75 50 1.5 1 1.5 0 1.0000 1.0000 0.0346 0.0501
75 50 1.5 1 2.0 0 1.0000 1.0000 0.0347 0.0501
75 50 1.5 1 5.0 0 1.0000 1.0000 0.0348 0.0500
100 50 1.0 1 0.1 0 0.0883 0.0880 0.0499 0.0500
100 50 1.0 1 0.5 0 0.8175 0.8149 0.0495 0.0495
100 50 1.0 1 0.8 0 0.9956 0.9954 0.0499 0.0500
100 50 1.0 1 1.5 0 1.0000 1.0000 0.0500 0.0499
100 50 1.0 1 2.0 0 1.0000 1.0000 0.0498 0.0500
100 50 1.0 1 5.0 0 1.0000 1.0000 0.0500 0.0500
100 50 0.5 1 0.1 0 0.1869 0.1003 0.1105 0.0499
100 50 0.5 1 0.5 0 0.9568 0.9065 0.1106 0.0503
100 50 0.5 1 0.8 0 0.9999 0.9995 0.1105 0.0499
100 50 0.5 1 1.5 0 1.0000 1.0000 0.1105 0.0503
100 50 0.5 1 2.0 0 1.0000 1.0000 0.1105 0.0499
100 50 0.5 1 5.0 0 1.0000 1.0000 0.1102 0.0502
100 50 1.5 1 0.1 0 0.0436 0.0768 0.0257 0.0499
100 50 1.5 1 0.5 0 0.5704 0.6730 0.0262 0.0500
100 50 1.5 1 0.8 0 0.9473 0.9706 0.0261 0.0499
100 50 1.5 1 1.5 0 1.0000 1.0000 0.0260 0.0500
100 50 1.5 1 2.0 0 1.0000 1.0000 0.0262 0.0501
100 50 1.5 1 5.0 0 1.0000 1.0000 0.0262 0.0502
Power Type I error rate
Student t-test
Welch t-test
Student t-test
Welch t-test
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !41 52
Table 4.18: Type I error rates and empirical powers in Student’s t-test, Welch’s t-test and Mann-Whitney U test for two independent groups following Skew-normal distribution
Power Type I error
Student t-test
Welch t-test
Mann-Whitney
U testStudent
t-testWelch t-test
Mann-Whitney
U test
10 10 1 1 -1 0 0.5637 0.5584 0.5403 0.0519 0.0502 0.0469
15 10 1 1 -1 0 0.6435 0.6360 0.6468 0.0518 0.0498 0.0519
20 10 1 1 -1 0 0.6930 0.6798 0.7067 0.0515 0.0502 0.0535
20 20 1 1 -1 0 0.8552 0.8545 0.8711 0.0505 0.0500 0.0554
30 20 1 1 -1 0 0.9125 0.9143 0.9273 0.0507 0.0499 0.0555
40 20 1 1 -1 0 0.9397 0.9404 0.9530 0.0505 0.0499 0.0565
50 50 1 1 -1 0 0.9977 0.9977 0.9987 0.0502 0.0501 0.0641
75 50 1 1 -1 0 0.9995 0.9996 0.9998 0.0499 0.0498 0.0671
100 50 1 1 -1 0 0.9999 0.9999 1.0000 0.0501 0.0499 0.0693
10 10 2 1 -2 0 0.7344 0.7163 0.7167 0.0624 0.0586 0.0690
15 10 2 1 -2 0 0.7875 0.8507 0.8279 0.0370 0.0539 0.0608
20 10 2 1 -2 0 0.8246 0.9179 0.8884 0.0241 0.0522 0.0535
20 20 2 1 -2 0 0.9512 0.9481 0.9606 0.0569 0.0548 0.0952
30 20 2 1 -2 0 0.9781 0.9892 0.9892 0.0322 0.0519 0.0828
40 20 2 1 -2 0 0.9892 0.9976 0.9968 0.0206 0.0512 0.0764
50 50 2 1 -2 0 0.9999 0.9999 1.0000 0.0527 0.0517 0.1535
75 50 2 1 -2 0 1.0000 1.0000 1.0000 0.0292 0.0511 0.1547
100 50 2 1 -2 0 1.0000 1.0000 1.0000 0.0180 0.0505 0.1538
10 10 2 1 -2 -1 0.3170 0.3055 0.3356 0.0610 0.0569 0.0682
15 10 2 1 -2 -1 0.3077 0.3758 0.3984 0.0354 0.0512 0.0584
20 10 2 1 -2 -1 0.2998 0.4335 0.4405 0.0229 0.0493 0.0500
20 20 2 1 -2 -1 0.5086 0.5004 0.6159 0.0562 0.0540 0.0945
30 20 2 1 -2 -1 0.5336 0.6242 0.7062 0.0313 0.0507 0.0799
40 20 2 1 -2 -1 0.5525 0.7129 0.7707 0.0199 0.0501 0.0720
50 50 2 1 -2 -1 0.8601 0.8577 0.9447 0.0527 0.0518 0.1505
75 50 2 1 -2 -1 0.9095 0.9461 0.9819 0.0287 0.0502 0.1471
100 50 2 1 -2 -1 0.9364 0.9780 0.9936 0.0180 0.0504 0.1458
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !42 52
Table 4.19: Type I error rates and empirical powers in Student’s t-test, Welch’s t-test and Mann-Whitney U test for two independent groups following SAS-normal distributions
Power Type I error
Student t-test
Welch t-test
Mann-Whitney
U testStudent
t-testWelch t-test
Mann-Whitney
U test
10 10 1.6153 1 1.5918 0 0.7531 0.7391 0.6761 0.0500 0.0486 0.0433
15 10 1.6153 1 1.5918 0 0.8253 0.8659 0.7792 0.0497 0.0493 0.0473
20 10 1.6153 1 1.5918 0 0.8639 0.9157 0.8298 0.0495 0.0501 0.0489
20 20 1.6153 1 1.5918 0 0.9811 0.9805 0.9508 0.0502 0.0498 0.0490
30 20 1.6153 1 1.5918 0 0.9932 0.9953 0.9778 0.0498 0.0498 0.0481
40 20 1.6153 1 1.5918 0 0.9967 0.9983 0.9875 0.0499 0.0499 0.0484
50 50 1.6153 1 1.5918 0 1.0000 1.0000 1.0000 0.0499 0.0499 0.0497
75 50 1.6153 1 1.5918 0 1.0000 1.0000 1.0000 0.0499 0.0498 0.0493
100 50 1.6153 1 1.5918 0 1.0000 1.0000 1.0000 0.0502 0.0501 0.0495
10 10 1.1473 1 -0.7058 0 0.2626 0.2565 0.2138 0.0498 0.0483 0.0430
15 10 1.1473 1 -0.7058 0 0.3097 0.3298 0.2696 0.0496 0.0492 0.0473
20 10 1.1473 1 -0.7058 0 0.3412 0.3726 0.3028 0.0497 0.0501 0.0489
20 20 1.1473 1 -0.7058 0 0.5225 0.5206 0.4343 0.0502 0.0498 0.0494
30 20 1.1473 1 -0.7058 0 0.5996 0.6174 0.5034 0.0501 0.0500 0.0480
40 20 1.1473 1 -0.7058 0 0.6468 0.6722 0.5508 0.0501 0.0501 0.0486
50 50 1.1473 1 -0.7058 0 0.9120 0.9119 0.8261 0.0498 0.0498 0.0494
75 50 1.1473 1 -0.7058 0 0.9526 0.9557 0.8885 0.0500 0.0499 0.0494
100 50 1.1473 1 -0.7058 0 0.9692 0.9726 0.9183 0.0500 0.0501 0.0493
10 10 1.6153 1.1473 -1.5918 -0.7058 0.2443 0.2321 0.2138 0.0491 0.0465 0.0431
15 10 1.6153 1.1473 -1.5918 -0.7058 0.2718 0.3322 0.2701 0.0495 0.0491 0.0478
20 10 1.6153 1.1473 -1.5918 -0.7058 0.2895 0.3914 0.3027 0.0495 0.0513 0.0491
20 20 1.6153 1.1473 -1.5918 -0.7058 0.4966 0.4923 0.4360 0.0494 0.0488 0.0491
30 20 1.6153 1.1473 -1.5918 -0.7058 0.5578 0.6085 0.5037 0.0500 0.0500 0.0485
40 20 1.6153 1.1473 -1.5918 -0.7058 0.5960 0.6722 0.5499 0.0494 0.0506 0.0484
50 50 1.6153 1.1473 -1.5918 -0.7058 0.8956 0.8953 0.8260 0.0494 0.0493 0.0493
75 50 1.6153 1.1473 -1.5918 -0.7058 0.9398 0.9492 0.8885 0.0499 0.0500 0.0495
100 50 1.6153 1.1473 -1.5918 -0.7058 0.9586 0.9693 0.9184 0.0503 0.0508 0.0497
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !43 52
Table 4.20: Type I error rates and powers in Student’s t-tests, Welch’s t-tests and Mann-Whitney U test (Gamma distributions)
Power Type I error
Student t-test
Welch t-test
Mann-Whitney
U testStudent
t-testWelch t-test
Mann-Whitney
U test
10 10 2 1.4142 1 0.7071 0.4865 0.4568 0.4431 0.0848 0.0810 0.1511
15 10 2 1.4142 1 0.7071 0.5300 0.6353 0.5404 0.0614 0.0598 0.1652
20 10 2 1.4142 1 0.7071 0.5600 0.7262 0.5976 0.0463 0.0502 0.1719
20 20 2 1.4142 1 0.7071 0.8331 0.8276 0.7796 0.0743 0.0729 0.2827
30 20 2 1.4142 1 0.7071 0.8860 0.9202 0.8561 0.0490 0.0589 0.3127
40 20 2 1.4142 1 0.7071 0.9141 0.9532 0.8969 0.0348 0.0525 0.3369
50 50 2 1.4142 1 0.7071 0.9977 0.9977 0.9927 0.0627 0.0621 0.5873
75 50 2 1.4142 1 0.7071 0.9996 0.9997 0.9983 0.0376 0.0548 0.6764
100 50 2 1.4142 1 0.7071 0.9998 0.9999 0.9994 0.0252 0.0516 0.7344
10 10 2 1.4142 3 1.0000 0.4583 0.4517 0.4871 0.0547 0.0525 0.0524
15 10 2 1.4142 3 1.0000 0.4900 0.5253 0.5883 0.0415 0.0504 0.0509
20 10 2 1.4142 3 1.0000 0.5093 0.5769 0.6501 0.0332 0.0499 0.0481
20 20 2 1.4142 3 1.0000 0.6983 0.6950 0.8177 0.0525 0.0517 0.0663
30 20 2 1.4142 3 1.0000 0.7552 0.7967 0.8928 0.0389 0.0504 0.0601
40 20 2 1.4142 3 1.0000 0.7919 0.8560 0.9320 0.0309 0.0499 0.0572
50 50 2 1.4142 3 1.0000 0.9663 0.9659 0.9958 0.0512 0.0509 0.0902
75 50 2 1.4142 3 1.0000 0.9862 0.9915 0.9994 0.0373 0.0503 0.0897
100 50 2 1.4142 3 1.0000 0.9933 0.9974 0.9999 0.0296 0.0501 0.0890
10 10 1 0.7071 3 1.0000 0.9941 0.9938 0.9931 0.0514 0.0490 0.0457
15 10 1 0.7071 3 1.0000 0.9987 0.9983 0.9986 0.0677 0.0505 0.0593
20 10 1 0.7071 3 1.0000 0.9996 0.9991 0.9995 0.0807 0.0514 0.0673
20 20 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0510 0.0500 0.0521
30 20 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0676 0.0503 0.0603
40 20 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0804 0.0507 0.0670
50 50 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0502 0.0498 0.0533
75 50 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0670 0.0500 0.0623
100 50 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0794 0.0498 0.0685
"n1 "σ2"σ1 "μ2"n2 "μ1
Page ! of !44 52
Note. The results are the numbers of observations required in the control group ( ! ), SSR is the sample size ratio
( ! ), with them sample size of the experimental group can be also calculated. SDR is the ratio of the
standard deviations ( ! ), effect size is calculated by Cohen’s ! for Student’s t-tests and Shieh’s ! for
Welch’s t-tests, Δ is the difference between two group means (rounded to 2 decimals).
Table 4.21: Required sample sizes in t-tests to achieve 80% power (under constant effect sizes)
SDR (SD1/SD2)
Effect size
SSR = 0.5 SSR = 1 SSR = 2
Student t-test Welch t-test Student t-test Welch t-test Student t-test Welch t-test
Δ n2 Δ n2 Δ n2 Δ n2 Δ n2 Δ n2
0.5
0.1 0.09 2356 0.15 525 0.08 1571 0.16 394 0.07 1178 0.18 264
0.3 0.26 263 0.45 60 0.24 176 0.47 45 0.21 132 0.55 31
0.5 0.43 96 0.75 23 0.40 64 0.79 18 0.35 48 0.92 13
0.8 0.69 39 1.20 10 0.63 26 1.26 10 0.56 20 1.47 10
1 0.87 25 1.50 10 0.79 17 1.58 10 0.70 13 1.84 10
1
0.1 0.10 2356 0.21 526 0.10 1571 0.20 394 0.10 1178 0.21 263
0.3 0.30 263 0.64 61 0.30 176 0.60 45 0.30 132 0.64 31
0.5 0.50 96 1.06 24 0.50 64 1.00 17 0.50 48 1.06 12
0.8 0.80 39 1.70 11 0.80 26 1.60 10 0.80 20 1.70 10
1 1.00 25 2.12 10 1.00 17 2.00 10 1.00 13 2.12 10
2
0.1 0.14 2356 0.37 527 0.16 1571 0.32 394 0.17 1178 0.30 263
0.3 0.42 263 1.10 62 0.47 176 0.95 45 0.52 132 0.90 30
0.5 0.71 96 1.84 25 0.79 64 1.58 18 0.87 48 1.50 12
0.8 1.13 39 2.94 12 1.26 26 2.53 10 1.39 20 2.40 10
1 1.40 25 3.67 10 1.58 17 3.16 10 1.74 13 3.00 10
n2
n1 : n2
σ1 : σ2 ds d
Page ! of !45 52
Note. The results are the numbers of observations required in the control group ( ! ), SSR is the sample size ratio
( ! ), with them sample size of the experimental group can be also calculated. SDR is the ratio of the
standard deviations ( ! ), effect size is calculated by Cohen’s ! for Student’s t-tests and Shieh’s ! for
Welch’s t-tests, Δ is the difference between two group means (rounded to 2 decimals).
Table 4.22: Required sample sizes in t-tests to achieve 80% power (under constant mean difference)
SDR (SD1/SD2) Δ
SSR = 0.5 SSR = 1 SSR = 2
Student t-test
Welch t-test
Student t-test
Welch t-test
Student t-test
Welch t-test
0.5
0.1 1768 1179 983 983 590 885
0.3 198 133 110 111 66 100
0.5 73 49 41 41 25 37
0.8 30 20 17 17 10 16
1 20 14 11 12 10 11
1
0.1 2356 2357 1571 1571 1178 1179
0.3 263 264 176 176 132 132
0.5 96 97 64 64 48 49
0.8 39 39 26 26 20 20
1 25 26 17 17 13 13
2
0.1 4711 7068 3926 3926 3533 2356
0.3 525 788 438 438 394 263
0.5 190 286 158 159 143 95
0.8 75 114 63 63 56 38
1 49 74 41 41 37 25
n2
n1 : n2
σ1 : σ2 ds d
Page ! of !46 52
Syntex 1 R-functions to obtain confidence intervals
#------------------------------------------------------------------------------------------------- # Obtain confidence limits for Cohen’s ! following the noncentral t-distribution #------------------------------------------------------------------------------------------------- Par.CL <- function(Group.1, Group.2){ n1 <- length(Group.1) n2 <- length(Group.2) # perform two-sample t-test assuming equal variance (same assumptions of cohen's d) t <- t.test(Group.1, Group.2, alternative = "two.sided", var.equal = TRUE)$statistic
# find limits of lambda at pt = 0.025 or 0.975 lambda <- 0.01 if (pt(q=t, df=n1+n2-2, ncp = lambda) > 0.025) { while (pt(q = t, df = n1+n2-2, ncp = lambda) - 0.025 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q=t, df=n1+n2-2, ncp = lambda) < 0.025) { while (0.025 - pt(q = t, df = n1+n2-2, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.1 <- lambda delta.1 <- lambda.1*sqrt(1/n1+1/n2) lambda <- 0.01 if (pt(q=t, df=n1+n2-2, ncp = lambda) > 0.975) { while (pt(q = t, df = n1+n2-2, ncp = lambda) - 0.975 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q=t, df=n1+n2-2, ncp = lambda) < 0.975) { while (0.975 - pt(q = t, df = n1+n2-2, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.2 <- lambda delta.2 <- lambda.2*sqrt(1/n1+1/n2) delta.low <- min(delta.1, delta.2) delta.upp <- max(delta.1, delta.2) result <- c(delta.low, delta.upp) return(result) }
ds
Page ! of !47 52
#--------------------------------------------------------------------------- # Obtain Welch confidence limits following Shieh's procedure #--------------------------------------------------------------------------- Shieh.CL <- function(Group.1, Group.2){ n1 <- length(Group.1) n2 <- length(Group.2) s1 <- sd(Group.1) s2 <- sd(Group.2) # perform two-sample Welch t-test (same assumptions of Shieh's d) V0 <- t.test(Group.1, Group.2, alternative = "two.sided", var.equal = FALSE)$statistic # sample estimates for degrees of freedom nu of noncentral t-distribution nu <- (s1^2/n1 + s2^2/n2)^2 / ((s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1)) # obtain the point estimator of the standardized mean difference G.nu <- gamma(nu/2)/(sqrt((n1+n2)*nu/2)*gamma((nu-1)/2)) delta.nu <- G.nu*V0 # find limits of lambda at pt = 0.025 or 0.975 lambda <- 0.01 if (pt(q=V0, df=nu, ncp = lambda) > 0.025) { while (pt(q = V0, df = nu, ncp = lambda) - 0.025 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q=V0, df= nu, ncp = lambda) < 0.025) { while (0.025 - pt(q = V0, df = nu, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.1 <- lambda delta.1 <- lambda.1/sqrt(n1+n2)
lambda <- 0.01 if (pt(q=V0, df=nu, ncp = lambda) > 0.975) { while (pt(q = V0, df = nu, ncp = lambda) - 0.975 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q = V0, df = nu, ncp = lambda) < 0.975) { while (0.975 - pt(q = V0, df = nu, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.2 <- lambda delta.2 <- lambda.2/sqrt(n1+n2) delta.low <- min(delta.1, delta.2) delta.upp <- max(delta.1, delta.2) result <- c(delta.low, delta.upp, delta.nu) return(result) }
Page ! of !48 52
#--------------------------------------------------------------------------------------------- # Obtain confidence limits following the percentile and the BS BCa methods #---------------------------------------------------------------------------------------------
# ---- Using Cohen's ds as point estimator ——
BCa.CL.ds <- function(Group.1, Group.2, B=1000, alpha=0.05){ # B <- 1000 # Number of bootstrap samples/replications # alpha <- 0.05 # Type I error rate (or 1-alpha as confidence coverage) n1 <- length(Group.1) n2 <- length(Group.2) Bootstrap.Results <- matrix(NA, B, 1) for (b in 1:B) { Bootstrap.Results[b,1] <- Cohens.d(sample(Group.1, size=n1, replace=TRUE), sample(Group.2, size=n2, replace=TRUE)) } Jackknife.Results <- matrix(NA, n1+n2, 1) Marker.1 <- seq(1,n1,1) for (sample.1 in 1:n1) { Jackknife.Results[sample.1, 1] <- Cohens.d(Group.1[Marker.1[-sample.1]], Group.2) } Marker.2 <- seq(1,n2,1) for (sample.2 in 1:n2) { Jackknife.Results[n1+sample.2, 1] <- Cohens.d(Group.1, Group.2[Marker.2[-sample.2]]) } Mean.Jackknife <- mean(Jackknife.Results) a <- (sum((Mean.Jackknife-Jackknife.Results)^3))/ (6*sum((Mean.Jackknife-Jackknife.Results)^2)^(3/2)) z0 <- qnorm(sum(Bootstrap.Results < Cohens.d(Group.1, Group.2))/B) CI.Low.BCa <- pnorm(z0 + (z0+qnorm(alpha/2))/(1-a*(z0+qnorm(alpha/2)))) CI.Up.BCa <- pnorm(z0 + (z0+qnorm(1-alpha/2))/(1-a*(z0+qnorm(1-alpha/2)))) Percentile.Confidence.Limits <- c(quantile(Bootstrap.Results, alpha/2), quantile(Bootstrap.Results, 1-alpha/2)) BCa.Confidence.Limits <- c(quantile(Bootstrap.Results, CI.Low.BCa), quantile(Bootstrap.Results, CI.Up.BCa)) return(BCa.Confidence.Limits) }
Page ! of !49 52
# ---- Using Hedge's ! as point estimator ----
BCa.CL.du <- function(Group.1, Group.2, B=1000, alpha=0.05) { # B <- 1000 # Number of bootstrap samples/replications # alpha <- 0.05 # Type I error rate (or 1-alpha as confidence coverage) n1 <- length(Group.1) n2 <- length(Group.2) Bootstrap.Results <- matrix(NA, B, 1) for (b in 1:B) { Bootstrap.Results[b,1] <- Unbiased.d(sample(Group.1, size=n1, replace=TRUE), sample(Group.2, size=n2, replace=TRUE)) } Jackknife.Results <- matrix(NA, n1+n2, 1) Marker.1 <- seq(1,n1,1) for (sample.1 in 1:n1) { Jackknife.Results[sample.1, 1] <- Unbiased.d(Group.1[Marker.1[-sample.1]], Group.2) } Marker.2 <- seq(1,n2,1) for (sample.2 in 1:n2) { Jackknife.Results[n1+sample.2, 1] <- Unbiased.d(Group.1, Group.2[Marker.2[-sample.2]]) } Mean.Jackknife <- mean(Jackknife.Results) a <- (sum((Mean.Jackknife-Jackknife.Results)^3))/ (6*sum((Mean.Jackknife-Jackknife.Results)^2)^(3/2)) z0 <- qnorm(sum(Bootstrap.Results < Unbiased.d(Group.1, Group.2))/B) CI.Low.BCa <- pnorm(z0 + (z0+qnorm(alpha/2))/(1-a*(z0+qnorm(alpha/2)))) CI.Up.BCa <- pnorm(z0 + (z0+qnorm(1-alpha/2))/(1-a*(z0+qnorm(1-alpha/2)))) Percentile.Confidence.Limits <- c(quantile(Bootstrap.Results, alpha/2), quantile(Bootstrap.Results, 1-alpha/2)) BCa.Confidence.Limits <- c(quantile(Bootstrap.Results, CI.Low.BCa), quantile(Bootstrap.Results, CI.Up.BCa)) return(c(Percentile.Confidence.Limits,BCa.Confidence.Limits)) }
gs
Page ! of !50 52
Syntex 2 R-functions to calculate the required sample size
# Obtain the theoretical power of Student’s t-test and Welch’s t-test t.power <- function(n1, n2, mu1, mu2, sigma1, sigma2, alpha=0.05){ t.nu <- n1+n2-2 t.critic <- qt(1-alpha/2, df=t.nu) sd.pooled <- sqrt( ((n1-1)*sigma1^2 + (n2-1)*sigma2^2) / (n1+n2-2) ) t.ncp <- abs((mu1-mu2) / (sqrt(1/n1+1/n2)*sd.pooled)) t.power <- 1-pt(q=t.critic,df=t.nu,ncp=t.ncp) welch.nu <- (sigma1^2/n1+sigma2^2/n2)^2 / ((sigma1^2/n1)^2/(n1-1)+(sigma2^2/n2)^2/(n2-1)) welch.critic <- qt(1-alpha/2,df=welch.nu) welch.ncp <- abs((mu1-mu2) / sqrt(sigma1^2/n1 + sigma2^2/n2)) welch.power <- 1-pt(q=welch.critic,df=welch.nu,ncp=welch.ncp) return(c(t.power, welch.power)) }
# Estimate delta for t-test and Welch t-test t.mu1 <- function(d,mu2=0,sigma1,sigma2=1,n1,n2){ object <- d * sqrt( ((n1-1)*sigma1^2 + (n2-1)*sigma2^2) / (n1+n2-2) ) + mu2 return(object) }
welch.mu1 <- function(d,mu2=0,sigma1,sigma2=1,n1,n2){ object <- sqrt(n1+n2) * d * sqrt(sigma1^2/n1 + sigma2^2/n2) + mu2 return(object) }
# Find the minimum sample sizes given specific effect sizes samplesize <- function(d,power,ssr,n2,sigma1,sigma2=1,mu2=0) { n0 <- n2 # back up for welch calculation n1 <- round(n2 * ssr) mu1 <- t.mu1(d=d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] while (t.pow < power) { n2 <- n2+1 n1 <- n2 * ssr mu1 <- t.mu1(d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] }
Page ! of !51 52
t.mu1 <- mu1 t.n <- n2 n2 <- n0 n1 <- round(n2 * ssr) mu1 <- welch.mu1(d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] while (welch.pow < power) { n2 <- n2+1 n1 <- n2 * ssr mu1 <- welch.mu1(d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] } welch.mu1 <- mu1 welch.n <- n2 return(c(t.mu1, t.n, welch.mu1, welch.n)) }
# Obtain the required sample size to detect the same mean difference samplesize.samemu <- function(diff, power, ssr, n2, sigma1, sigma2=1, mu2=0){ n0 <- n2 # back up for welch calculation n1 <- round(n2 * ssr) mu1 <- diff + mu2 t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] while (t.pow < power) { n2 <- n2+1 n1 <- n2 * ssr t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] } t.n <- n2 n2 <- n0 n1 <- round(n2 * ssr) welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] while (welch.pow < power) { n2 <- n2+1 n1 <- n2 * ssr welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] } welch.n <- n2
return(c(t.n, welch.n)) }
Page ! of !52 52