lib.ugent.be€¦ · 2 introduction 2.1 effect size - background knowledge effect size measures are...

!

Benchmarking effect size measures for comparing the difference

between two independent group means

Limin Liu

Master dissertation submitted

to obtain the degree of

Master of Statistical Data Analysis

Promoter: Prof. Dr. Christophe Ley

Co-promoter: Prof. Dr. Christophe Leys

Tutor: Marie Delacre

Department name of the promoter:

Department of Applied Mathematics,

Computer Science and Statistics

The author and the promoter give permission to consult this master dissertation and to copy it

or parts of it for personal use. Each other use falls under the restrictions of the copyright, in

particular concerning the obligation to mention explicitly the source when using results of this

master dissertation.

FOREWORD

Reform of statistical practice in the social and behavioural sciences requires wider use of

effect size measures and the associated confidence intervals (CIs). However, the choice of

effect size measures and approaches to build the CIs depends on the specific conditions. This

is complex and challenged in practice. The present work is to provide a guideline to report the

appropriate effect size estimator and the associated CI under the certain context of

comparisons between two independent group means.

The development of this work has greatly benefited from the generous help of my promotor,

Prof. Christophe Ley. The core concepts discussed in this thesis such as bias, consistency,

type I error, power, and so on were originally taught in his course Statistical Inference. The

academic training through his course equipped me with a firm foundation for conducting

statistic analysis. I especially appreciate his remarks to the draft, as well as his great support

with incisive explanations, encouragements and inspirations throughout my master study. My

co-promotor, Prof. Christophe Leys (Université libre de Bruxelles) is the initiator of this

research project. I am grateful to his guidance, instructions as well as the valuable comments

from the psychological perspective to the draft. I am also grateful to my tutor Marie Declare

(Université libre de Bruxelles) for lots of hours-long discussions on aspects of effect size

indices, type I error rate, Welch’s t-test, and her inputs on the draft. Her previous work on the

comparison between the two t-tests enlightened this study. Undoubtedly, this work is a joint

effort of psychologists and statistician. I would also like to thank my colleague, Emma, for

every great job we accomplished together and countless fulfilling moments in our team

adventure of statistics. I am appreciated for the opportunity to follow this master program and

many thanks to all the teachers for the numerous inspiring moments during the last three

semesters.

Last but definitely not least, my deepest appreciation to Xi, Isabella and Alexander. This work

would never be possibly accomplished without their generosity, understanding and support.

TABLE OF CONTENTS

1 Abstract 1

2 Introduction 2

2.1 Effect size - background knowledge 2

2.2 Effect size indices to compare two independent group means 3

2.2.1 Under the assumptions of normality and homogeneity of variances 3

2.2.2 Alternatives to Cohen’s under heteroscedasticity 5

2.2.3 Alternatives to Cohen’s ds under non-normality 6

2.3 Confidence intervals of effect size 8

2.4 Effect size, significant test and power analysis 9

2.5 Objectives 10

3 Method 11

3.1 Key statistics for evaluation of effect size estimators and confidence intervals 11

3.1.1 Bias rate, variance and consistency of the point estimators 11

3.1.2 Confidence intervals 12

3.1.3 Type I error and power for the statistical tests 17

3.1.4 Summary 17

3.2 Methods of the Monte Carlo simulations 19

3.2.1 Simulation design for the study of point estimators 19

3.2.2 Simulation design for the study of confidence intervals 20

3.2.3 Simulation design for the study of hypothesis tests 21

4 Results 22

4.1 Bias, consistency and variance of the point estimators 22

4.1.1 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s under Assumption 1 22

4.1.2 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s d under Assumption 2 23

4.1.3 Cohen’s ds, Hedge’s gs, Shieh’s d, dMAD, dR and PSindep under Assumptions 3 and 4 24

4.2 Accuracy and precision of confidence intervals 26

4.3 Statistical tests to compare two groups 30

4.3.1 Type I error rates 30

4.3.2 Power analysis 34

5 Discussion 39

6 References 42

1 Abstract

The choice of effect size measures and approaches to build the associated confidence intervals

is highly dependent on specific conditions. However, limited studies properly report effect

size in the last several decades. Therefore, it is of great relevance to provide a guideline to

researchers to choose proper effect size measures and their confidence intervals. In the

context of group comparisons, Cohen’s ! that relies on the often untenable assumptions of

normality and homogeneity of variances is the dominant effect size measure used by

researchers. However, the consequence of arbitrary use of Cohen’s ! when the assumptions

are violated and the superiority of alternatives to Cohen’s ! under specific conditions remain

vague. This study conducted comprehensive Monte Carlo simulations to compare Cohen’s !

and its alternatives under specific conditions across all assumptions regarding the point

estimators, confidence intervals and related statistical tests. Results show that Shieh’s d is

mostly preferred as a default effect size measure across all assumptions because it always

provides comparable lower bias rates and maintains higher precision compared to Cohen’s ! ,

with the only exception under heteroscedasticity when the group with less observations has

larger variances. Furthermore, we analysed four confidence intervals constructed through

noncentral t-distribution (NC) and Bootstrapping bias-corrected and accelerated strategy (BS

BCa) in terms of coverage probability and precision. NC interval around Shieh’s d is

considered as the optimal estimate under the assumption of normality, while BS BCa interval

around Cohen’s ! shows great advantages under non-parametric conditions especially when

the observations are adequate. Moreover, we compared two t-tests and observed that on one

hand, Welch’s t-test outperformed Student’s t-test in type I error rate with a cost of power. On

the other hand, Welch’s t-test required less observations than Student’s t-test to achieve the

same power in most conditions except when there are less observations in the group with

bigger standard deviation under the assumption of normality. This study contributes to the

effect size literatures by providing a guideline to choose the appropriate effect size estimator

and confidence interval approach to convey quantitative information in applied research.

ds

ds

ds

ds

ds

ds

! of !1 44

2 Introduction

2.1 Effect size - background knowledge

Effect size measures are one of the most important outcomes of empirical studies for three

reasons (Lakens, 2013): 1) they allow researchers to present the magnitude of effect in a

standardised manner and to communicate the practical significance of their results instead of

only reporting the statistical significance; 2) they allow researchers to draw meta-analytic

conclusions by comparing standardised effect sizes across studies; 3) the effect sizes from

previous studies can be used in a priori power analysis when planning a new study. Therefore,

understanding and distinguishing the difference of diverse effect size measures are crucial to

guide researchers to choose proper estimators in certain contexts.

Effect sizes express the magnitude of an effect. Formally, Grissom & Kim (2005) used the

term “effect size” to describe the degree to which results differ from what is implied for them

by a null hypothesis. Later, Kelly & Preacher (2012) defined effect size as a quantitative

reflection of the magnitude of some phenomenon that is used for the purpose of addressing a

question of interest. According to the general review of Ferguson (2009), effect sizes can be

categorised into four general classes associated to different research interests: 1) group

difference (known as Cohen’s d family); 2) strength of association (variance explained,

known as r family) and 3) the corrected estimates (adjusted ! , also belonging to r family, r

and ! are really measures of effect size on their own, due to the fact that r covers the whole

range of relationship strengths, from 0 to 1 or -1); 4) risk estimates (categorical association,

known as c family. The c family relates to the categorical effect sizes, including Phi (φ),

which is equivalent to the correlation coefficient r for the goodness of fit in 2x2 contingency

tables and Cramer’s V (φc) for contingency tables greater than a 2x2 design. These estimators

also give good norming from 0 to 1 regardless of table sizes).

Each category has its own calculating rules and assumptions, resulting in various versions of

estimators in response to the change of conditions. In order to evaluate the suitability of effect

size estimators, three important statistical properties that are suggested in the previous studies

are explored in this study, including unbiasedness, consistency, efficiency (Kelley & Preacher,

2012); in addition, interpretability (Cumming, 2013) is also discussed afterwards. Moreover, a

confidence interval around the effect size point estimator will also serve as a null hypothesis

statistical test and provide a far better understanding of the results than does a simple

R2

R2

! of !2 44

significance answer. Therefore, we are also motivated to inspect accuracy and precision

properties of different interval estimations. Finally, as effect size measures origin from

statistical tests, comparisons of type I error rates and powers of different tests also deserve our

attention.

2.2 Effect size indices to compare two independent group means

Our study focuses on the effect size measures in the context of between-group comparisons.

We first go over the effect size indices interpreted as the standardised means differences under

the assumptions of normality and homogeneity of variances. Next, we review the alternatives

when the assumptions might be either or both violated.

2.2.1 Under the assumptions of normality and homogeneity of variances

In the context of group means comparisons, the most commonly used and perhaps the most

intuitively appealing effect size is the standardised difference of an effect, typically termed

Cohen’s d (Cohen, 1988) which indicates the entire family of group difference effect sizes. In

association with different experimental designs, there are various versions of Cohen’s d

denoted by different subscripts (Cohen, 1988; Lakens, 2013). We limit our review to

“between-subject” designs where individuals are randomly assigned into groups under

different experimental conditions. The main task under this study design is to compare the

behaviour of those in one experimental condition with the behaviour of those in another

(Charness et al., 2012). Unpaired t-tests (or independent t-tests since the subjects in one group

are not related to those in another group) are usually applied to serve this type of experimental

design.

• Cohen’s ! - population effect size

The population-standardised difference between two group means is defined by Cohen (1985)

as:

! (1)

where both populations follow a normal distribution with the mean ! in the ! group (j =1, 2)

and the standard deviation ! .

• Cohen’s ! - estimator of Cohen’s !

In reality, it is seldom that researchers have the access to the population standard deviation ! .

Therefore, we normally calculate Cohen’s ! by using the sample estimates. Cohen refers to

δ

Cohen's δ =μ1 − μ2

σ

μj jth

σ

ds δ

σ

δ

! of !3 44

the standardized mean difference between two groups of independent observations for the

sample as ! which is given by:

! (2)

where the numerator is the difference between two group means from observations ( ! is the

sample mean from the ! group (j = 1, 2)); the denominator is the square root of the pooled

standard deviation assuming equal standard deviations in two populations.

• Hedges’ ! - bias corrected version of Cohen’s !

As Cohen’s ! is based on sample averages, not surprisingly, it will give a biased estimation of

the population effect size when the variety of data is reduced by limited observations (n < 20).

Evidences are shown in previous researches that Cohen’s ! tends to be larger than the

population value in the long run if it is estimated from a small sample (Lakens, 2013;

Cumming & Jageman, 2017). This over-estimation is due to a bias of SD, which tends to be

lower than the population’s SD. Because the mean is not biased, when divided by an under-

estimated SD, it leads to an over-estimated measure of Cohen’s ! . The recommended bias-

corrected version of Cohen’s ! is originally defined by Hedges and Olkin (1985), which is

calculated as:

! (3)

Obviously, as long as researchers report the number of participants in each condition for a

between-subjects comparison and the t-value, Cohen’s ! and Hedges’ ! can be calculated

when data are normally distributed with equal standard deviations in both parenting groups.

These effect size indices serve well under the assumptions of normality and homogeneity of

variance, but they may not be well advised for use with data that violates these assumptions

(Grissom & Kim, 2001; Kelley, 2005; Shieh, 2013). Actually, the presence of unequal

variances is a realistic assumption in psychological research and the non-normality is also not

rare in practice (Delacre et al., 2017). Therefore, if two treatments produce distributions with

different variances and/or different shapes, more power and informative alternatives should be

considered.

ds

Cohen's ds =X1 − X2

(n1 − 1)SD21 + (n2 − 1)SD2

2n1 + n2 − 2

Xj

jth

gs ds

ds

ds

δ

ds

Hedges's gs = Cohen's ds × (1 −3

4(n1 + n2) − 9)

ds gs

! of !4 44

https://garstats.wordpress.com/2018/02/02/rtbias1/

2.2.2 Alternatives to Cohen’s ! under heteroscedasticity

• Glass’s !

Whenever standard deviations differ substantially between conditions, one well-known

solution is to apply the Glass’s (1976) formula with either one of the two sample standard

deviations as the standardizer to both standardised mean differences simultaneously, i.e., the

following applications can be given if we choose to use the standard deviation of the control

group:

! (4)

Hedges’ correct version of Glass’s ! can be obtained by replacing Cohen’s ! with Glass’s !

in formula (3).

• Shieh’s d

The second approach considers Welch’s statistic for the well-known Behrens-Fisher problem

of comparing the difference between two normal means that may have unequal population

variances (Kim & Cohen, 1998). To extend the notion of effect size within the heteroscedastic

framework, Shieh (2013), following Kulinskaya and Staudte (2007), defined the effect size

estimator Shieh’s ! :

! (5)

where ! , i = {1, 2} is the group size allocation ratio, and ! . This effect size

estimator is a function of mean difference, variance components and allocation ratios. Unlike

the homogeneous variance cases, this method takes account of the sample size allocation

ratios into the calculation of the group variance. It suggests that the standardised mean

difference needs to accommodate the design characteristic of group allocation scheme under

the more sophisticated situation of heteroscedasticity. According to the statistical properties of

Welch’s statistic under heteroscedasticity, it does not appear possible to define a proper

standardised effect size without accounting for the relative group size of subpopulations in a

sampling scheme. We consider Shieh’s d as a robust effect size measure under

heteroscedasticity and are motivated to investigate its behaviour when observations come

from normal populations with unequal variances or even from non-normal distributions.

ds

ds

Glass's ds =Xexperimental − Xcontrol

SDcontrol

ds ds ds

d

Shieh's d =X1 − X2

(SD21 /q1 + SD2

2 /q2)

qi = ni /N N = ∑ ni

! of !5 44

2.2.3 Alternatives to Cohen’s ds under non-normality

If the sample sizes are large and with equal variances, one can expect valid results from

Student’s t-tests and Cohen’s ! since they are robust to violations of normality when other

assumptions are met and the null hypothesis is true (Boneau 1960, in Kelley 2005). However,

these restricted conditions are not always met in practice. If the sample sizes are limited (e.g.,

less than 20 observations in each group) or two group means differ significantly, analysing

non-normal dataset by way of procedures that assume normality can have serious misleading

implications for the conclusions. Thus, researchers have been exploring appropriate

alternatives for estimating the group means differences when the assumption of normality is

violated. We review two type approaches here: 1) the parametric alternatives to Cohen’s ! by

performing t-tests (including Student’s t-test and Welch’s t-test); 2) the non-parametric

solutions. To avoid possible confusion in practice, two remarks on the non-parametric tests

have to be made: 1) the non-parametric tests do not have the same ! as t-tests since they

don’t assume equal group means but equal distributions; 2) the non-parametric tests provide

distinct advantages when the sample observations are extremely limited and/or the

assumption of normality is violated, but they are not guaranteed to be immune from

heteroscedasticity. In this study, we take the non-parametric effect size measure into account

as they are expected to provide standard references in the comparisons among various effect

size estimators especially under non-parametric conditions.

• Using trimmed medians and MAD

Hedges and Olkin (1985) proposed an effective solution to handle extremely skewed-

distributed datasets. It is suggested to trim the highest and lowest scores from both groups,

replace the mean difference with the median difference, and the standard deviation with MAD

(see detailed illustrations of the calculation of MAD in Leys et al., 2013). Thus, one

alternative version of equation (4) is given as:

! (6)

where ! and ! are the sample medians of the experimental and the control group.

Obviously, when dealing with a sufficiently heavy-tailed distribution where the number of

outliers tends to be large, comparing medians can result in larger power than methods based

on the means.

ds

ds

H0

dMAD =Mdnexp − Mdnctr

MADctr

Mdnexp Mdnctr

! of !6 44

• Using the trimmed mean and the winsorized variance

Another robust version of Cohen's ! is constructed by applying the 20% trimmed means and

the pooled 20% winsorized variance (Algina et al., 2005) which can be given as:

! (7)

where ! (j = 1, 2) is a 20% trimmed mean from the ! group (j = 1, 2) and ! is the square

root of the pooled 20% winsorized variance of two samples, that is,

! (8)

In formula (8), ! is the 20% winsorized standard deviation in ! group. Including 0.642 in

formula (7) ensures that ! when the data are drawn from normal distributions

with equal variances (see detailed descriptions of the calculation of the trimmed mean and the

winsorized variance in Algina et al., 2005). 20% is recommended as a common trimming

percentage based on the comprehensive considerations on the performance of removing

outliers, robustness to cases of heterogeneity and nonnormality, type I error control, and the

statistical power (see the justification of 20% trimming in Keselman et al., 2002).

• Mann-Whitney effect size

As previously mentioned, in case of independent group comparisons, classic nonparametric

tests such as the Mann-Whitney U-test are widely applied as the alternatives to t-tests without

assuming normality, although they do not exactly examine the same assumptions. For the

effect size estimator, Grissom and Kim (2012) suggest that one obtains the Mann-Whitney U

statistic and then divides it by the product of the two sample sizes:

! (9)

The U-value represents the number of times observations in group 1 precede observations in

group 2 in ranking. This effect size estimates the probability that a score randomly drawn

from group 1 will be greater than a score randomly drawn from group 2. Remarkably, similar

to the fact that the Mann-Whitney U test doesn’t compare the means of two groups, this effect

size measure isn’t a standardised mean difference index. On the other hand, note that Mann-

Whitney U test is not immune to heterosedasticity, this effect size estimator is not necessarily

to be a convincing option when both the assumptions of normality and homogeneity of

ds

dR = 0.642 × (Xt1 − Xt2

SDW)

Xtj jth SDW

SD2W =

(n1 − 1)SD2W1 + (n2 − 1)SD2

W2

n1 + n2 − 2

SDWj jth

δR = Cohen's δ

PSindep =U

n1n2

! of !7 44

variances are both violated. We will further explore the appropriateness of this effect size

measure across assumptions in the simulation study. We expect that this effect size indicator

would be a standard reference to other alternatives especially under non-normality and with

limited observations, since it is recommended effect size measure with no restrictions on

normality and sample sizes.

In summary, the proposed effect size measures to compare two independent groups are

integrated in Table 2.1. Briefly, Cohen’s ! is applied under the assumption of normality and

homogeneity of variances, while Glass’s ! and Shieh’s d are proposed when we cannot

assume equal variances in two groups. If the assumption of normality is violated, parametric

effect size indices such as ! and ! and non-parametric alternatives such as ! are

suggested with distinct advantages in dealing with specific situations.

2.3 Confidence intervals of effect size

The reporting of a confidence interval (CI) around a point estimate from a sample provides

complete information of the primary results of interest. Cumming & Finch (2001) highlighted

four reasons to use CIs: 1) They give point and interval information that is accessible and and

comprehensible and so they support substantive understanding and interpretation. 2) There is

a direct link between CIs and familiar null hypothesis significance testing (NHST): Noting

that an interval excludes a value is equivalent to rejecting a hypothesis that asserts that value

as true - at a significance level related to that critical value C. A CI may be regarded as the set

of hypothetical population values consistent, in this sense, with the data. 3) CIs are useful in

the cumulation of evidence over experiments: They support meta-analysis and meta-analytic

thinking focused on estimation. 4) CIs give information about the precision of the observed

estimate. They can be estimated before conducting an experiment and the width used to guide

the choice of design and sample size. After the experiment, they give information about

precision that may be more useful and accessible than a statistical power value.

ds

ds

dMAD dR PSindep

! of !8 44

Table 2.1 Summary of effect size estimators

Assumption Normality & Homogeneity

Homogeneity is violated

Normality is violated

Parametric Non-parametric

Effect sizeShihe's d =

X1 − X2

(SD21 /q1 + SD2

2 /q2)

Glass's ds =Xexperimental − Xcontrol

SDcontrol

dR = 0.642 × (Xt1 − Xt2

SDW)

dMAD =Md nexp − Md nctr

M A Dctr PSindep =U

n1n2Hedges's gs = Cohen's ds × (1 −3

4(n1 + n2) − 9)

Cohen's ds =X1 − X2

(n1 − 1)SD21 + (n2 − 1)SD2

2n1 + n2 − 2

In practice, there are mainly two approaches to construct CI around the point estimators of

effect size: the parametric method based on the noncentral t-distributions and the non-

parametric solution applying Bootstrapping procedures. Within the bootstrap framework, two

approaches to find the confidence limits values are delineated and they have different

statistical properties: 1) percentile method which is first-order accurate because the error of

the percentage of confidence interval coverage approaches zero at a rate related to

! (see detailed explanations in Kelly, 2005); 2) the bias-corrected and

accelerated (BCa) strategy which is second-order accurate because the over- or undercoverage

of the 100(1 – α)% BCa confidence interval approaches zero at a rate related to

! , which is smaller than that of bootstrapped percentile interval. Therefore, the

second type of bootstrap confidence interval is generally recommended and its advantage over

the first interval approach in terms of coverage probability and precision is evidence-based

convincing (Kelly, 2005). In out study, we decided to follow this BCa procedure to build the

non-parametric intervals for the comparisons between parametric and non-parametric

intervals. We will illustrate the concrete calculation steps in the Methods session.

2.4 Effect size, significant test and power analysis

Any statistical relationship in a sample can be interpreted into two ways: 1) the relationship in

the sample reflects a relationship in the population; 2) the relationship in the sample only

reflects the sampling error and there is actually no relationship in the population.

Null hypothesis testing is the formal approach to make the decision between these two

interpretations of a statistical relationship with the available observations. The idea that there

is no relationship in the population is usually interpreted into the null hypothesis (often

denoted as ! ), while the other interpretation is converted to the alternative hypothesis (often

denoted as ! or ! ). The crucial step of the null hypothesis testing is to find the likelihood of

the sample results under ! . This probability is called p-value. Reasonably, higher p-values

suggest a stronger support to ! and extremely low p-values do not give enough evidence to

accept ! . With any statistical test, however, there is always the possibility that we will find a

difference between two groups when it does not actually exists. This is called a Type I error

(denoted as ! ). Likewise, it is possible that when a difference does exist, the test will not be

able to identify it. This type of mistake is called a Type II error (denoted as ! ). The statistical

power (1- ! ) refers to the probability that the test will find a statistically significant difference

1/ min(n1, n2)

1/min(n1, n2)

H0

Ha H1

H0

H0

H0

α

β

β

! of !9 44

when such a difference actually exists. In other words, power is the probability that we reject

the null hypothesis correctly (and thus avoid a Type II error).

Effect sizes are always associated with inferential statistical tests. Once the test suggests a

significant effect, effect size is needed to be calculated to confirm if this statistically

significant effect is also practically meaningful or important in decision-making. Moreover,

effect size has an influence on the power of a test; in practice, effect size is used in power

analysis to determine the sample size. On the other hand, different tests may vary in the rates

of committing type I and type II errors under different assumptions. In the context of

comparisons of two independent groups, Student’s t-test and Welch’s t-test are most common

used as well as the alternative Mann-Whitney U test when data is non-normally distributed.

Detecting the difference among these tests will also enhance our understanding of effect size.

2.5 Objectives

In order to conduct a comprehensive study on effect size measures and confidence interval

approaches in the context of group comparisons, we designed the simulation study with three

main interests: 1) to observe the behaviour of Cohen’s ! and proposed alternatives under

different assumptions; 2) to detect the difference between the parametric and non-parametric

approaches for constructing the confidence intervals; 3) to explore the difference among

hypothesis tests for comparing two independent groups and its association with effect size. All

these processes serve to find the appropriate effect size measure(s) and associated CI

approach(s) under distinct assumptions.

ds

! of !10 44

3 Method

In this session, we first go over the key statistics that we defined and applied to examine the

statistical properties of the effect size point estimators and CIs. Then, we describe the study

design of the Monte Carlo simulations. Statistics and approaches that are more likely to differ

from the common practice or appear less familiar with scientific readers receive more

explanations in this session. On the other hand, the calculation of well-known statistics such

as mean, variances, standard deviations are not to be specified as we assume they are

undoubtable in the scientific society.

3.1 Key statistics for evaluation of effect size estimators and confidence intervals

3.1.1 Bias rate, variance and consistency of the point estimators

An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of

the sampling distribution of the estimator is equal to the true parameter value:

!

To gauge the bias of the sample estimators, we define the statistic bias rate as following:

! (10)

where E(d) is the averaged value of the effect size estimates generated by specific effect size

estimator (such as Cohen’s ! , Shieh’s d, Glass’s ! etc.); ! is the true effect size for the

population. In formula (10), the numerator captures the difference between the averaged

sampling estimators and the true parameter; the denominator is to standardise the difference

for the convenience of comparisons. The positive or negative sign of ! reflects the over-or

underestimation problem of the estimator; the absolute value of ! depicts the magnitude or

the severity of bias. In our study, we expect to find effect size estimators whose bias rates get

close to zero.

Besides the bias rate, another important statistic property of an estimator is the variance. As

bias suggests that the middle of the sampling distribution falls in line with the real parameter

is important (location-related), the variance of the estimator indicates that how concentrated

the estimates are around the real parameter is also important (shape-related). Sometimes, we

are willing to trade the location for a “better” shape, a tighter shape around the real unknown

parameter, so that we reduce the chance to be unluckily “miss” the real parameter. Therefore,

E(estimator) = parameter

rbias =E(d ) − δ

δ

ds ds δ

rbias

rbias

! of !11 44

we will take both the bias rate and the variance as important evaluation criterion in the

comparisons among various effect size estimators.

The third essential property if an estimator is consistency. An estimator is consistent if, as the

sample size increases, the estimates (produced by the estimator) “converge” to the true value

of the true parameter being estimated in probability. To be slightly more precise - consistency

means that, as the sample size increases, the sampling distribution of the estimator becomes

increasingly concentrated at the true parameter value. In our case, we expect to find unbiased

effect size estimators whose bias rates decrease with increasing sample sizes.

3.1.2 Confidence intervals

3.1.2.1 Four confidence intervals

To detect the difference between parametric and non-parametric interval estimation

approaches across conditions, we constructed two intervals around selected effect size point

estimators following each of these two approaches. Namely, non-central percentage

confidence intervals around Cohen’s ! and Shieh’s d, and the bootstrap bias-corrected and

accelerated intervals for Cohen’s ! and Hedge’s ! (R functions employed to perform the

suggested confidence intervals are available in Syntax 1 in appendix).

• NC t-distribution intervals in Student’s t-test with Cohen’s ds

Consider two independent random samples from two normal populations with means ! and

! , and standard deviations ! and ! , respectively. We wish to test the following hypothesis:

! : ! versus ! : !

When we assume homogeneity of variances, the test is based on the Student’s t-statistic:

! (11)

where ! , ! , ! , ! are the sample estimates of the unknown parameters ! , ! , ! , ! ,

respectively. When the null hypothesis is true, it follows a central t-distribution with

! degrees of freedom. However, when the null hypothesis is false, it then

follows a non-symmetric distribution that is known as a noncentral t-distribution with !

degrees of freedom and noncentrality parameter ! . The noncentrality parameter is a function

of Cohen’s ! and the sample sizes:

ds

ds gs

μ1

μ2 σ1 σ2

H0 μ1 = μ2 Ha μ1 ≠ μ2

T =X1 − X2

( 1n1

+ 1n2

)(n1 − 1)SD2

1 + (n2 − 1)SD22

n1 + n2 − 2

X1 X2 SD1 SD2 μ1 μ2 σ1 σ2

ν = n1 + n2 − 2

ν

λ

ds

! of !12 44

! (12)

! , the observed value of Student’s t-statisitc defined in formula (11), is used to estimate the

noncentrality parameter ! . By the confidence interval transformation principle (Cumming &

Finch, 2001), finding the confidence limits for ! leads to the confidence limits for Cohen’s ! .

In brief, we first need to find the lower and upper confidence limits for ! (noted as ! and ! ).

! is obtained by finding the noncentrality parameter whose 1- ! quantile is ! ; likewise,

! is obtained by finding the noncentrality parameter whose ! quantile is ! . These lower

and upper limits bracket ! with 100(1-! )% confidence. Once the confidence limits for ! have

been obtained, they can be transformed into confidence limits for Cohen’s ! by applying

formula (13). The confidence interval around Cohen’s ! is computed in the following

manner:

! (13)

Thus, given that the statistical assumptions are met, equation (13) provides the 100(1- ! )%

confidence limits around Cohen’s ! .

• NC intervals in Welch’s t-test with Shieh’s d

Welch’s approximate t procedure has been considered as a satisfactory and robust solution in

the two-sample t under the heterogeneous variances assumption (Delacre et al., 2017).

Welch’s statistic V is given by:

! (14)

As the exact distribution of Welch’s statistic V is comparatively complicated, the practical

importance and methodological complexity of the problem has led to numerous attempts to

develop variance procedures and algorithms for resolving the issue (Kim & Cohen, 1998, and

references therein). To accurate the corresponding interval estimation of the effect size within

the heteroscedastic framework, Shieh (2013) presented an alternative approach to construct

confidence intervals of the standardised mean difference. This approach firstly suggested that

with the same theoretical arguments and analytic derivations as in Welch, the statistic V has

the general approximate distribution:

λ = Cohen's dsn1n2

n1 + n2

Tobs

λ

λ ds

λ λL λU

λL α /2 Tobs

λU α /2 Tobs

λ α λ

ds

ds

Prob. λLn1 + n2

n1n2≤ δ ≤ λU

n1 + n2

n1n2= 1 − α

α

ds

V =X1 − X2

(SD21 /n1 + SD2 /n2)

! of !13 44

! (15)

where ! is a noncentral t-distribution with degrees of freedom ! and noncentrality

parameter ! (N is the total number of observations in two samples and ! is the

theoretical value of Shieh’s effect size index, see formula (5)). The degrees of freedom ! was

originally defined by Welch as:

! (16)

Since this ! depends on the unknown variances, an approximate version is suggested as:

! (17)

Hence, the adjustment gives the following modified distribution:

! (18)

The following step to find the confidence limits of the noncentrality parameter is similar to

the related step in the construction of NC intervals around Cohen’s , as previously

illustrated. The confidence limits for can be found by using the observed value of Welch

statistic V defined by formula (14). Once the confidence limits and have been obtained,

they can be transformed into confidence limits for Shieh’s ! by applying the following

formula:

! (19)

• Bootstrap Bias-corrected and accelerated (BS BCa) confidence intervals

The general bootstrap technique is a resampling procedure whereby random samples are

repeatedly drawn from the set of observed data a large number of times to study the

distribution of the statistic(s) of interest given the obtained data. In this study, our interest lies

in examining the distribution of the bootstrapped effect size point estimator d values

calculated from the random samples drawn from the observed data with replacement. An

important remark of this procedure is that it makes no assumption about the parent population

from which the data were drawn other than that the data are randomly sampled and thus

representative of the parent population.

V ∼ t (ν*, λ*)

t (ν*, λ*) ν*

λ* = N × δ* δ*

ν*

ν* = {σ21 /n1 + σ2

2 /n2}2

{σ21 /n1}2 /(n1 − 1) + {σ2

2 /n2}2 /(n2 − 1)

ν*

ν = {SD21 /n1 + SD2

2 /n2}2

{SD21 /n1}2 /(n1 − 1) + {SD2

2 /n2}2 /(n2 − 1)

V ∼ t ( ν, Nδ*)

λ*

ds

λ*

λ*L λ*U

δ*

Prob. [λ*L / N ≤ δ* ≤ λ*U / N] = 1 − α

! of !14 44

As explained in 2.3, we chose the bootstrap bias-corrected and accelerated confidence

intervals (BS BCa) to build the non-parametric intervals around associated effect size point

estimators. Briefly, the computation of the BCa proceeds in three steps (Figure 3.1).

As illustrated in Figure 3.1, we first repeatedly draw random samples with replacement from

the observed data for B times (B = 1,000 in our study) and calculate the effect size point

estimator d* for the bootstrap subsample within each time of the replication. Afterwards, a B

length vector of d* is then collected.

Next, we calculate the bias correction value ! and the acceleration parameter ! . ! is obtained

by calculating the proportion of the d* values that are less than the observed d (effect size

point estimator calculated from the observed data) and then finding the quantile from the

normal distribution with that cumulative probability:

! (20)

where ! is the standard normal cumulative distribution function and ! its inverse (e.g.,

! and ! ), and # is read as “the number of”. The acceleration

parameter ! is computed as follows:

! (21)

where ! is adapted value of d when the ! (i = 1, 2, … N, N is the total number of

observations in two groups) data point has been deleted (this strategy is also known as

jackknife resampling) and the ! is the mean of the N jackknifed ! values.

The last step is to find confidence interval quantiles by means of ! and ! , and then, the

corresponding values from the distribution of d*. Once ! and ! have been calculated, the

limits of the confidence interval are calculated by finding the values from the bootstrap

sample that correspond to the ! and ! quantiles of the observed bootstrap distribution.

The ! and ! values are found from the followingtions:

! (22)

and

z0 a z0

z0 = Φ−1( #(d* < d )B )

Φ Φ−1

Φ[1.96] = 0.975 Φ−1[0.95] = 1.645

a

a =∑

i = 1(d − d(−i))3

6((∑i = 1

(d − d(−i))2)3/2

)d(−i) ith

d d(−i)

z0 a

z0 a

CIL CIU

CIL CIU

CIL = Φ( z0 +z0 + zα/2

1 − α( z0 + z(α/2)) )

! of !15 44

! (23)

such that ! and ! represent the quantiles from the distribution of d*. That is, the

confidence limits from the BCa approach are obtained by finding the values from the

bootstrap distribution of d*, that correspond to the ! and ! cumulative probabilities.

Following this BCa approach, we built two confidence intervals around Cohen’s ! and its

bias-corrected version Hedge’s ! . The only difference in the construction procedure is that

we apply formula (2) to obtain the observed d value for the confidence interval around

Cohen’s ! and formula (3) in building the confidence interval around Hedge’s ! .

3.1.2.2 CI width and coverage probability

To evaluate the interval estimations, CI width is well recognised as a good index of precision

because it reflects a number of aspects of the precision of a study, including the amount of

variability in the population, the sample size and thus sampling error, and the amount of error

CIU = Φ( z0 +z0 + z1−α/2

1 − α( z0 + z(1−α/2)) )CIL CIU

CIL CIU

ds

gs

ds gs

! of !16 44

d*1

d*2

d*3

d*B

Sample A1

Sample A(n1)

Sample B(n2)

Sample B1

Sample A2

Sample A3

Sample B2

Sample B3

Sample AB Sample BB

d

Resampling for B times (f.i., B = 1000)

Step 2

Compare d to d* and calculate z0;Use Jackknife resampling to get α

Calculate CIL and CIU

use formulas (19)-(22)

Step 3

{d*1 , d*2 , d*3 , d*4 , d*5 , . . . , d*B− 3, d*B − 2, d*B − 1, d*B }CIL CIUquantile:

BS BCa CI: [d*3 , d*B− 2]

Step 1: Obtain a bootstrap sample of d*

Figure 3.1: Computation procedure of BCa CI

in the dependent variable (Steiger and Fouladi, 1997; Cummings and Finch, 2001). We define

CI width w as:

! (24)

We didn’t take the average as the value of width because the confidence interval is not a

symmetric interval around ! under the noncentral t-distribution.

As an indicator of accuracy, Coverage probability is taken as the percentage of confidence

intervals whose bounds correctly bracket the population value (% of coverage). The coverage

percentage of an idea confidence interval is expected to be the specified nominal level which

is 5% in our study.

3.1.3 Type I error and power for the statistical tests

In the simulation study for statistical tests, we need to calculate the type I error rate and the

empirical power for specific test. According to the definition (previously stated in 2.4), we

obtained the type I error rate as the proportion of the count of rejection times (p-value < 0.05

assuming the significance level to be 0.05) in the total simulations when the null hypothesis is

true. Similarly, we calculated power as the proportion of the count of rejecting times in the

total simulation times when the null hypothesis is false.

Additionally, we also calculated two sets of required sample sizes to achieve the same power

of 80% in Student’s t-test or Welch’s t-test. The first calculation is conducted by controlling

the effect sizes, more precisely, we are comparing the least number of observations that these

two t-tests require to gain a power of 80%. For each effect size setting, we found the

associated mean differences (calculated from the effect size) and the required sample size of

the control group to detect the specific mean differences. The second situation is to compare

the required sample sizes to gain a power of 80% to detect the same difference in two group

means. The R functions to calculate the minimum sample size is available in Syntax 2 in

appendix.

3.1.4 Summary

The methodologies to compare effect size measures, confidence interval approaches and

statistical tests under different assumptions are summarised in Table 3.1. In summary, the

analysis consists of three aspects: a) the bias, consistency and variance of the point estimates

of effect size measures; b) the coverage probability and interval widths of interval

w = CIU − CIL

δ

! of !17 44

estimations; c) type I error rates, power and required sample sizes to achieve the expected

power level of the hypothesis test.

! of !18 44

Table 3.1: Summary of evaluation criterion and methods

Effect size estimates Evaluation Criterion Key statistic

Point estimatesBias and consistency

Variance Var(d)

Confidence intervalPrecision

Accuracy % of coverage of the true parameter

NHSTType I error

Power

under H0#p < 0.05

Nsi m

!rbias =E (d ) − δ

δ

!w = CIU − CIL

under Ha#p < 0.05

Nsi m

3.2 Methods of the Monte Carlo simulations

With specific research interests in the evaluation process, we conducted series of Monte Carlo

simulations in three aspects.

3.2.1 Simulation design for the study of point estimators

We created 243 scenarios with varying sample sizes, population standard deviations and the

true mean differences (Table 3.2). For both normal and non-normal distributions, we chose

increasing sample sizes from very limited observations (e.g., only 10 observations in each

group) to large numbers (e.g., at least 50 observations in each group) in combination with

increasing ratios between sample sizes. Without losing generality, we set the second group as

the control group with constant configurations of the mean ( ! ) and the standard

deviation ( ! ). We give less configurations for each non-normal distribution as the

resulting number of total distributions are still comparable to the normal distributions. For the

parameters of non-normal distributions that are not defined by mean and standard deviation,

we gave configurations that show apparent non-normality and the mean and standard

deviation are calculated from the non-normal parameters such as scale, shape or skewness.

Three non-normal distributions were chosen based on a consideration of their value range and

application frequency in psychological studies. They are Skew-normal (Azzalini, 1985), Sinh-

arcsinh (Jones and Pewsey, 2009) and Gamma distributions (see distribution illustrations in

Figure 1 in appendix). Each scenario is based on 100,000 simulations of datasets in the

program R.

μ2 = 0

σ2 = 1

! of !19 44

Table 3.2: Simulation design for the study of point estimators

Control group -Experimental group Scenarios

Population parameters Sample parameters

Total sample size Sample size ratio (n1/n2)

Normal - Normal 162 0.1; 0.5; 0.8; 1.5; 2; 5 0.5; 1; 1.5

20; 25; 30; 40;

50; 60;

100;

125; 150

1; 1.5; 2

Normal - Skew-normal 18 -2; -1 1; 2

Skew normal - Skew normal 9 -1 2

Normal - SAS normal 18 -0.7; 1.6 1.1; 1.6

SAS normal - SAS normal 9 2.3 1.4

Normal - Gamma 18 -2; -1 0.7; 1.4

Gamma - Gamma 9 1 2

!μ1 − μ2SD ratio ( ! /! )σ1 σ2

For the effect size measures which are expressed in medians (e.g., ! ) or trimmed

variances ( ! ), it is very complicated to obtain the true parameters of the population since the

simulated distributions are not defined by medians (if the data doesn’t follow a normal

distribution) or any trimmed values. As a solution, we generated a large random sample as the

population pool which contains 100,000 observations, then we randomly drew samples from

this parenting pool with restricted size for later analysis. As we have the entire population

dataset, it is very easy to get the values of the medians, the trimmed variances and the

winsorized standard deviations.

3.2.2 Simulation design for the study of confidence intervals

For the part of confidence intervals, we mainly focus on the comparison of four confidence

intervals: NC procedures with Cohen’s ! in Student’s t-test, NC procedures with Shieh’s d in

Welch’s t-test, BS BCa method with Cohen’s ! and Hedge’s ! . We don’t apply the

bootstrapping procedures to all the other indices, because we are more interested to detect the

difference between the NC and BS BCa approaches than to explore the minor differences

between different estimations under one procedure. On the other hand, economic research

design is always preferred in practice since the computation of bootstrap procedures under

large number of simulation replications requires huge amount of time and resources. As a

result, we selected a few settings that represent the sensitive situations when the assumptions

are met or violated (see an overview of the configurations in Table 3.3).

dMAD

dR

ds

ds gs

! of !20 44

Table 3.3: Configurations in the simulation study of confidence intervals

Effect sizeSample size Standard deviation

SD1 SD2

0; 0.2; 0.5; 1;

2

10 10 1 1

10 10 2 1

20 10 1 1

20 10 2 1

75 50 1 1

75 50 2 1

!n1 !n2

With the given sample sizes and parameter configurations, estimates of the true coverage

probability are computed through Monte Carlo simulation of 100,000 independent datasets.

For each replicate, four confidence limits associated with two-sided upper and lower 100(1- ! )

% confidence intervals are computed together with their coverage probability, interval widths,

standard deviation of the interval widths, the mean and median of the interval widths and

interval limits and the empirical powers.

3.2.3 Simulation design for the study of hypothesis tests

Considering computing the type I error rate and power required large number of replications,

the type I error rates and empirical powers are calculated from 1,000,000 simulations under

each of the 243 scenarios (see configurations in Table 3.2).

In addition, we illustrated the different p-value distributions of Student’s t-test, Welch’s t-test

and the Mann-Whitney U test under typical situations where problems or defaults easily

occur. The configurations of the illustration cover cases representing: very small samples,

balanced design with unequal variances; very small samples, unbalanced design, but with

equal variances; big samples, unbalanced design, with unequal variances.

In the part of power analysis, we calculated two types of the minimum sample size to obtain

the power of 80% by controlling either the effect size or the group means. As it would not be

a practical concern with larger effect size or mean differences, we only chose the values of the

effect size or the mean difference that are less than 1.

α

! of !21 44

4 Results

4.1 Bias, consistency and variance of the point estimators

In order to investigate how Cohen’s ! and other effect size indices react across the

assumptions of normality and homogeneity of variances, Monte Carlo simulations of 243

scenarios were conducted. For each of the 243 scenarios (see configurations in Table 3.2), we

examined the bias rates and variances of the proposed effect size estimators related to specific

condition (see effect size measures in Table 1.1). As a result, the study covers four different

assumptions: Assumption 1 - both normality and homogeneity of variances are met (see

detailed results in Table 4.1 in appendix); Assumption 2 - only normality is met (Table 4.2 in

appendix); Assumption 3 - only homogeneity is met (Table 4.3 in appendix); Assumption 4 -

non-normality and heterogeneity of variances (Table 4.4 in appendix). As previously

explained (see 3.1.1), we are mainly interested to observe the bias rate and variance of the

estimators as well as the trend of bias rate with increasing sample sizes for the comparisons

among various effect size measures.

4.1.1 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s under Assumption 1

We firstly compare the performance of effect size estimators under Assumption 1 - normality

and homogeneity of variances, in terms of bias rate and variance. Figure 4.1 shows estimator

bias tends to decrease and precision is also improved with increasing sample sizes. For the

samples with more than 50 observations, effect size estimators except Glass’s ! are

asymptotically unbiased, with bias rate less than 1%.

For the groups with same number of observations, effect size estimators tend to be more

biased and variable when the difference between two group means enlarges, with exception of

the case ! and ! (Figure 4.1).

Among effect size indicators, Glass’s ! shows least precision and highest bias rates (Figure

4.1). Cohen’s ! performs the lowest bias rates under all configurations but not with the best

precision. As the bias-corrected version, Hedge’s ! produces similar bias rates and slightly

smaller variances compared to Cohen’s ! . Shieh’s d, on the other hand, demonstrates

comparable bias rates to Cohen’s ! and Hedge’s ! , but significantly outperforms in precision

(Figure 4.1). Notably, bias rate also depends on the experimental designs. When we are

working with two balanced groups, Cohen’s ! and Shieh’s d constantly produce the same

ds

d

ds

n1 : n2 = 40 : 20 μ1 − μ2 = 0.1

ds

ds

gs

ds

ds gs

ds

! of !22 44

bias rates, which are much smaller than Glass’s ! . This is no surprise since the relationship

between Cohen’s ! and Shieh’s d assuming equal sample sizes and standard deviations can be

deduced as: ! and ! . On the

other hand, for two unbalanced groups, Cohen’s ! is noticeably less biased than Shieh’s d but

this difference decreases with increasing sample sizes and does not appear to be associated

with effect size magnitudes (Figure 4.1). Nevertheless, Shieh’s d consistently shows smaller

variance in the sample estimates than Cohen’s ! . Therefore, Shieh’s d is preferred due to its

comparable bias rates but significantly smaller variances compared to Cohen’s ! .

4.1.2 Cohen’s ds, Hedge’s gs, Glass’s ds and Shieh’s d under Assumption 2

We next investigate the performance of these estimators under the assumption of normality

but heterogeneity of variance. Similar to assumption 1, bias rate and variance of all estimators

tend to decrease with increasing sample size (Figure 4.2). Cohen’s ! behaves always

similarly as Hedge’s ! in both bias rate and precision, but clearly less biased compared to

Glass’s ! and less precise than Shieh’s d (the relationship between the variances of Cohen’s

! and Shieh’s d can be derived as: ! ) (Figure 4.2).

Overall, Shieh’s d is the best estimator due to its comparable bias rate but clearly lower

variance (Figure 4.2).

ds

ds

Cohen's ds = 2 × Shieh's d Var(Cohen's ds) = 4 × Var(Shieh's d )

ds

ds

ds

ds

gs

ds

ds Var(Cohen's

ds ) = 4 × Var(Shieh's

d )

! of !23 44

0.00

0.02

0.0

0.1

0.2

0.3

0.4

0.5

0.04

0.06

0.08

0.10

Var

ianc

e(S

DR

=1)

0.1µ1-µ2

n1:n2

0.5 1.5

10:10

0.1 0.5 1.5

20:10

0.1 0.5 1.5

20:20

0.1 0.5 1.5

40:20

0.1 0.5 1.5

50:50

0.1 0.5 1.5

100:50

Cohen’s ds

Hedge’s gs

Glass’s ds

Shieh’s d

Figure 4.1: Assumption 1 - Normality and homogeneity of variances

Bia

s ra

te(S

DR

=1)

SDR: Standard Deviation Ratio

Remarkably, simulation results reveal that sample size allocation and standard deviation ratio

may jointly influence the behaviour of Shieh’s d (Figure 4.2, Table 4.2 in appendix). Given

unbalanced designs, if the standard deviation is smaller in the group with more observations

(e.g. ! , ! ), Cohen’s ! / Hedge’s ! is

less biased than Shieh’s d; otherwise, Shieh’s d shows the lowest bias rate which is similar to

Cohen’s ! / Hedge’s ! (Figure 4.2).

4.1.3 Cohen’s ds, Hedge’s gs, Shieh’s d, dMAD, dR and PSindep under Assumptions 3 and 4

When the assumption of normality is violated, both bias rates and variances of all effect size

estimators again decrease with increasing sample sizes (Figures 4.3 - 4.4). This is consistent

under both homoscedastic and heteroscedastic frameworks. The non-parametric estimator,

! shows the lowest bias rates with the smallest variances among all the proposed effect

size indices (Figures 4.3 - 4.4), while among the parametric effect size indices, Shieh’s d is

still the best indicator of effect size which produces comparable bias rates but smaller

variances under all configurations (Figures 4.3 - 4.4). These findings are consistent under

skew-normal, SAS normal and Gamma distributions. Notably, ! is significantly biased

in all cases, this may be because our configured non-normal distributions are not severely

skewed where this effect size measure shows the greatest advantages.

n1 : n2 = {(20 : 10); (40 : 20); (100 : 50)} σ1 : σ2 = 0.5 ds gs

ds gs

PSindep

dMAD

! of !24 44

0.1µ1-µ2

n1:n2

0.5 1.5

10:10

0.1 0.5 1.5

20:10

0.1 0.5 1.5

20:20

0.1 0.5 1.5

40:20

0.1 0.5 1.5

50:50

0.1 0.5 1.5

100:50

0.00

0.04

0.08

0.12

SDR = 0.5

0.1 0.5 1.5

10:10

0.1 0.5 1.5

20:10

0.1 0.5 1.5

20:20

0.1 0.5 1.5

40:20

0.1 0.5 1.5

50:50

0.1 0.5 1.5

100:50

SDR = 1.5

Bia

s ra

te

0.0

0.2

0.4

0.6

Var

ianc

e

Cohen’s ds

Hedge’s gs

Glass’s ds

Shieh’s d

Figure 4.2: Assumption 2 - Normality and heterogeneity of variances


! of !25 44

0.00

0.10

0.20

0.30

1.0

0.2

0.4

0.6

0.8

0.0

Bia

s ra

te (µ

1-µ2=-

1, S

DR=

1)V

aria

nce

(µ 1-µ

2=-1,

SD

R=1)

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50

Cohen’s ds

Hedge’s gs

PSindep

Shieh’s ddMADdR

Figure 4.3: Assumption 3 - Non-normality and homogeneity of variances


0.0

0.1

0.2

0.3

0.4

0.5

0.0

1.0

2.0

3.0

0.0

0.1

0.2

0.3

0.4

0.0

0.5

1.0

1.5

0.00

0.10

0.20

0.30

0.0

1.0

2.0

3.0

Bia

s ra

te

Skew-normal distribution

Sinh-arc-sinh normal distribution

Gamma distribution

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = 1.6; SDR = 1.6 µ1 - µ2 = -0.7; SDR = 1.1

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = 1.6; SDR = 1.6 µ1 - µ2 = -0.7; SDR = 1.1

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2

n1:n2 10:10 20:10 20:20 40:20 50:50 100:50 10:10 20:10 20:20 40:20 50:50 100:50µ1 - µ2 = -2; SDR = 0.7 µ1 - µ2 = 1; SDR = 2

Varia

nce

Cohen’s dsHedge’s gs

PSindep

Shieh’s ddMADdR

Figure 4.4: Assumption 4 - Non-normality and heterogeneity of variances


Across different assumptions, it is also observed that the behaviour of Cohen’s ! in terms of

bias rates and variances varies with conditions (Figures 4.1 - 4.4). When the assumptions of

normality and homogeneity of variances are both met, Cohen’s ! / Hedge’s ! is consistently

less biased under both balanced and unbalanced experimental designs compared to the other

two effect size measures (Figure 4.1, Table 4.1 in appendix). However, it tends to be more

biased under Assumption 2 given the same configurations to Assumption 1; and commits very

high bias rates (e.g. 11.42% under Skew-normal distributions, 14.86% under Gamma

distributions) when the assumptions of normality and homogeneity are violated (Table 4.4 in

appendix). Although the bias rate of Cohen’s ! decreases with enlarging numbers of

observations overall, it still seems relative higher compared to the simulation results from the

same sample sizes under Assumption 1 (Figure 4.4, Table 4.4 in appendix). In addition, it

always suffers larger variances than Shieh’s d under all assumptions. Notably, we cannot

provide further discussion on exact within-estimator difference across conditions since the

parameter configurations are not unified between normal (Assumptions 1 and 2) and non-

normal distributions (Assumptions 3 and 4). This will be further discussed in the Discussion

session.

4.2 Accuracy and precision of confidence intervals

As highlighted in 2.3, reporting the associated confidence interval around the effect size point

estimator provides all the information that P values do, and more. Therefore, besides the

examination on different effect size measures, we also constructed four confidence intervals

following the parametric and non-parametric approaches and compared the interval estimates

in terms of Coverage probability which indicates the accuracy of estimates and CI width

which indicates the precision of estimates.

The simulation study makes comparisons among the following four confidence intervals

(construction methodologies are described in 3.1.2.1): the NC t-distribution CI around

Cohen’s ! assuming homogeneity of variances; the NC t-distribution CI around Shieh’s d

under the heteroscedastic framework; the BS BCa CI around Cohen’s ! ; and BS BCa CI

around Hedge’s ! . To avoid large amount of computing load which might be caused by

bootstrap procedures and also to prevent repetitions in the similar case of assumptions, we did

not repeat all the scenarios in the simulation study of the point estimators, but the

representative configurations to show the most potential problematical situations in reality.

ds

ds gs

ds

ds

ds

gs

! of !26 44

The parameter configurations under normal and skew-normal distributions are the same in

e a c h s c e n a r i o : w i t h v a r i a n c e s ! a n d s a m p l e s i z e s

! . These settings not only include both homoscedastic/

heteroscedastic and balanced/unbalanced designs, but also create direct and inverse parings

between variance and sample size structures. Overall, these considerations result in a total of

six different joined configurations for each of the parametric and non-parametric simulations.

Without loss of generality, the second group mean is fixed as ! , and the first group mean

! is chosen such that the standardised mean difference Cohen’s ! = 0, 0.2, 0.5, 1, 2

representing small, medium and large effect sizes for each combined structure of ! and

! . Detailed results of Monte Carlo simulations under each of the 60 scenarios are listed

in Tables 4.5 - 4.16 in appendix. We illustrate the two most important statistics of confidence

intervals (i.e., coverage probability and interval width) in Figures 4.5 and 4.6.

Overall, NC method outperforms BS BCa method either when the assumptions of normality

and homogeneity of variances are both met or under the true null hypothesis. When the null

hypothesis is true, NC intervals around Cohen’s ! always approach the closest to 95%

coverage percentage under Assumption 1; otherwise, NC intervals around Shieh’s d achieve

mostly the favourite accuracy, with two exceptions indicating unbalanced designs with equal

variances under small samples (Tables 4.13 and 4.15 in appendix). When there is real

difference between two groups (the alternative hypothesis is true), NC intervals around

Shieh’s d gives most accurate estimations for small and medium effect sizes ( ! = 0.2 and 0.5)

under smaller samples and for all ranges of effect sizes when the sample size gets larger

(Table 4.5, 4.7 and 4.9 in appendix). When the assumption of homogeneity of variances is

violated, NC intervals with Shieh’s d shows distinct advantages in terms of coverage

probability compared to all the other intervals across all experimental designs and sample

sizes (Figure 4.5, Tables 4.6, 4.8, 4.10 in appendix). Thus, NC confidence intervals around

Shieh’s d performs the best in terms of coverage probability either when the parametric

assumptions are met or the null hypothesis is true.

When the parametric assumptions are violated, BC BCa intervals around Cohen’s ! provide

coverage percentiles closest to 95% when the number of observations are large (Table 4.15

and 4.16 in appendix). Noticeably, there is under-coverage problem with BS BCa intervals

around Hedge’s ! when two groups contain moderate sizes of observations, but BS approach

tends to approximate satisfying coverage percentages when the sample size gets larger (Figure

(σ21 , σ2

2 ) = (1,1) and (2,1)

(n1, n2) = (10,10), (20,10) and (75,50)

μ2 = 0

μ1 ds

(σ21 , σ2

2 )

(n1, n2)

ds

δ

ds

gs

! of !27 44

4.5, Tables 4.5 - 4.16 in appendix). This also suggests the importance of an adequate number

of independent pieces of information in the framework of resampling. However, if the

observations are relatively limited, NC intervals with Shieh’s d still gives overall the most

satisfying coverage percentages (Figure 4.5, Table 4.11, 4.12 in appendix). Therefore, when

the assumption of normality is violated, BS BCa intervals around Cohen’s ! NC intervals

offer the most accurate interval estimations with adequate observations (e.g., at least 50

observations in each group); and the parametric NC intervals around Shieh’s d achieve

satisfying coverage probabilities with small sample sizes (e.g., no more than 20 observations

in each group).

In terms of precision of the estimations, overall, all intervals tend to give more precise

estimations with increasing sample sizes (Figure 4.6). Moreover, the mean widths of all the

proposed intervals broaden with enlarging effect sizes. In comparison, NC intervals around

Shieh’s d achieves the narrowest widths across all assumptions, while NC intervals with

ds

! of !28 44

1

0.85

0.95

Effect size 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2

n1:n2 10:10 20:10 75:50 10:10 20:10 75:50

σ1 = σ2 = 1 σ1 = 2, σ2 = 1

1

0.85

0.95

Confidence Intervals under assumptions 1 and 2

Figure 4.5 NC - Cohen’s ds

BS BCa - Cohen’s ds NC - Shieh’s d

BS BCa - Hedge’s gs


Cov

erag

e (%

)C

over

age

(%)

Cohen’s ! are wider than twice the length of the NC intervals with Shieh’s d. In addition, the

BS intervals are also more precise than the NC interval of Cohen’s ! .

To sum up, the coverage probably and interval width of certain CI vary across different

assumptions, experimental designs and sample sizes. Accordingly, the choice of interval

approaches should take into account of all these issues and effect size magnitude if available.

Based on our observations in the simulation study, NC intervals around Shieh’s d maintains

stable coverage percentages that are very close to the expected nominal level (95% in this

study) across all conditions under normality assumption. When the parametric assumption is

violated, this interval approach still has advantages for estimations under balanced designs,

but can potentially yield misleading confidence interval limits such that the empirical

coverage is greater or less than the nominal coverage specified under unbalanced designs. On

the other hand, BS BCa intervals are preferred under non-parametric conditions especially

when the number of observations is adequate.

ds

ds

! of !29 44

0.0

1.0

2.0

3.0

0.0

1.0

2.0

3.0

Ave

rage

wid

thA

vera

ge w

idth

Figure 4.6 NC - Cohen’s ds

BS BCa - Cohen’s ds NC - Shieh’s d

BS BCa - Hedge’s gs

Effect size 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2 0 0.2 0.5 1 2

n1:n2 10:10 20:10 75:50 10:10 20:10 75:50

σ1 = σ2 = 1 σ1 = 2, σ2 = 1



4.3 Statistical tests to compare two groups

As highlighted in 2.4, effect sizes are always associated with inferential statistical tests and

these tests react differently under different assumptions in terms of the rates of committing

type I and type II errors. In the context of comparisons of two independent group means,

Student’s t-test and Welch’s t-test are most commonly used significant tests according to the

assumptions. The former is applied if we can assume homogeneity of variances in two

groups; the latter offers more flexibility in case we cannot assume equal variances.

Accordingly, all the parametric effect size measures can be connected with these two t-tests

based on the consistence of assumptions, e.g., Cohen’s ! with Student’s t-test, Shieh’s d with

Welch’s t-test etc. When the number of observations is extremely few and the data does not

follow a normal distribution, Mann-Whitney U test is widely used as an alternative to t-tests

although it compares two group distributions rather than two group means. We consider that

these three tests cover the range of effect size measures of our interests. As a complementary

to the study of effect size measures and CI approaches, we explore the difference between

these tests in terms of type I error rates, empirical powers and required sample sizes.

4.3.1 Type I error rates

Monte Carlo simulation results of type I error rates and empirical powers under each of 243

scenarios (see configurations in Table 3.2) are based on 1,000,000 replications in the program

R (see detailed simulation results in Table 4.17 - 4.20 in appendix). Table 4.1 summarises the

simulation results of type I error rates across four assumptions. When the assumptions are

met, Student’s t-test is obviously better than Welch’s t-test, because it produces stable type I

error rates (SD( ! )=0.2%) which exactly yield the expected nominal level ( ! =5%) on average.

Welch’s t-test generates more various type I error rates (SD( ! )=0.7%) slightly further from

5% (Mean( ! )=0.0494) under balanced designs. Furthermore, Welch’s t-test obviously

outperforms Student’s t-test when the assumption of homogeneity of variances is violated

(Assumption 2). It produces the lowest average type I error rate and with smaller variance on

average, which indicates a stable level of the observed type I error rates. The simulation result

indicates that the Welch t-test appears to have a stable type I error rate around the expected

alpha level even under non-normal conditions. On the other hand, we do expect a better

performance of Mann-Whitney U test under non-normal conditions but the result doesn’t

show evidence. This is mainly due to the difference of the null hypothesis between t-tests and

ds

α α

α

α

! of !30 44

Mann-Whitney U test. Mann-Whitney U test compares two distributions rather than two

group means, but in the simulation studies, we generated the true null hypothesis group by

random samples with equal means. Moreover, the undesirable higher type I error rates

produced by Mann-Whitney U test again confirms its immunity to heteroscedasticity.

Note. Mean(α) is the averaged value of the observed type I error rates in simulation studies, SD(α) is the

standard deviation of the observed type I error rates. Cell colours indicate how far the observed α deviates from

the significance level ( ! =0.05 in this study), i.e., a deeper colour indicates a shorter distance between the observed type I error rate and α (0.05). The cutting values for defining the colours are 0.05, 0.001, 0.005, 0.01.

These findings are confirmed by simulation results for the p-values under the true null

hypothesis based on 1,000,000 independent t-tests. In simulation of the p-values of three tests

mentioned above, we configured small samples in both homogenetic and heteroscedastic

frameworks and big samples with unequal variances, also in combination with balanced and

unbalanced designs (Table 4.2).

When the assumptions are met (Assumption 1), i.e. two normal distributed groups with equal

variances, the type I error rate (probability to reject null hypothesis when null hypothesis is

true) is very similar between for Students’s t-test and Welch’s t-test even under unbalanced

α

! of !31 44

Table 4.1: Simulation results of type I error of Student’s t-test, Welch’s t-test and Mann-Whitney U test

Assumption DistributionResearch

design

Student t-test Welch t-test Mann-Whitney U test

1

Normal

Balanced design 0.0500 0.0002 0.0494 0.0007 - -

Unbalanced design 0.0500 0.0002 0.0499 0.0003 - -

2Balanced

design 0.0519 0.0014 0.0499 0.0004 - -

Unbalanced design 0.0646 0.0354 0.0500 0.0006 - -

3 Skew-normal

Balanced design 0.0508 0.0009 0.0501 0.0001 0.0555 0.0086

Unbalanced design 0.0508 0.0007 0.0499 0.0002 0.0590 0.0074

4

Skew-normal

Balanced design 0.0570 0.0041 0.0546 0.0028 0.1050 0.0381


SAS

Balanced design 0.0498 0.0004 0.0490 0.0011 0.0472 0.0031


Gamma

Balanced design 0.0592 0.0124 0.0578 0.0117 0.1530 0.1800


SD(! )αMean( ! )α SD(! )αSD(! )α Mean( ! )αMean( ! )α

designs (Figure 4.7(a)-1). Student’s t-test performed slightly more stable and lower type I

error rate than Welch’s t-test when the sample size is extremely small (only 5 observations in

one group), but this difference quickly disappeared when the observations increased to at least

10 in each group. Thus, when the assumptions of normality and homogeneity of variances are

met, Student’s t-test is slightly better than Welch’s t-test in terms of type I error rates in case

of extremely small sample sizes (n < 15), otherwise, the two t-tests appear to perform similar

in type I error rates.

When we can only assume normality (Assumption 2), different p-value distributions show up

between the two t-tests even under balanced designs (Figure 4.7(a)-2) and Welch’s t-test

appears to achieve the more stable and lower type I error rates (Figures 4.7(a)-3 and 4.7(a)-4).

Moreover, the distinction appears to be more obvious when the difference of variances in two

groups increases, even with larger number of observations. Nevertheless, the Student’s t-test

offers either too conservative or less cautious liberal results due to the heteroscedasticity

while the observed p-values for Welch’s t-test maintain constantly stable distributions (Figure

4.7(a)-4). Therefore, Welch’s t-test is preferred to Student’s t-test under Assumption 2.

! of !32 44

Table 4.2 Configurations in the simulation study of p-values

Assumption Description Distribution

1 unbalanced small samples with equal variances 15:10 1:1

Normal2

balanced small samples with unequal variances 10:10 0.5:2

unbalanced big samples with unequal variances75:50 0.5:1

100:50 3:1


Skew-normal4

balanced small samples with unequal variances 10:10 2:1

unbalanced big samples with unequal variances75:50 2:1

50:100 2:1


Gamma4

balanced small samples with unequal variances 10:10 1:0.5

unbalanced big samples with unequal variances75:50 2:0.5

50:100 1:0.5

!σ1 : σ2!n1 : n2

The findings are consistent under Skew-normal distributions (Figure 4.7(b)). Both Welch’s t-

test and Student’s t-test maintain stable type I error rates either when the assumption of

homogeneity of variances is met or under balanced designs, even with very small sample

sizes. On the other hand, when we encounter the combination of heteroscedasticity and

unbalanced observations in each group, even with adequate observations (e.g., n ! 50 for ≥

! of !33 44

Student's t−test

n1=15n2=10SDR=1

Freq

uenc

y

020000

60000

Welch's t−test

0

20000

40000

60000

n1=10 n2=10 SDR=0.25

n1=75n2=50 SDR=0.5

(a) N

orm

al d

istr

ibut

ions

n1=100 n2=50 SDR=3

Observed p−value

0 0.2 0.4 0.6 0.8 1.0

40000

80000

0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0

(b) S

kew

-nor

mal

dis

trib

utio

ns

n1=15n2=10 SDR=1

n1=10 n2=10 SDR=2

n1=75 n2=50 SDR=2

n1=50 n2=100 SDR=2

020000

6000040000

80000100000

0

20000

40000

60000

050000

150000100000

200000250000

Student's t−test

Welch's t−test

Mann−Whitney U test

n1=15 n2=10 SDR=1

n1=10 n2=10 SDR=2

n1=75 n2=50 SDR=4

n1=50 n2=100 SDR=2

050000

150000

100000

200000

050000

150000

100000

200000

0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0

0200000

600000400000

8000001000000

0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0

(c) G

amm

a di

strib

utio

ns

Student's t−test

Welch's t−test

Mann−Whitney U test

Figure 4.7: The distribution of observed p-values

(1) (2) (3) (4)


each group), the difference between the two t-tests still appears remarkably (Figures 4.7(b)-3

and (b)-4). Notice here the Mann-Whitney U test does not have better result as expected, this

is mostly because the assumptions of this test are not realised in the configurations of the

simulated data. Nevertheless, we did not observe the same trend under gamma distributions,

which indicates the application of Welch’s t-test as nonparametric test is not universally valid

especially when the sample distribution deviates severely from the normal shape or the

distinction between two group standard deviations is extreme (Figure 4.7(c)).

To sum up, Welch’s t-test is preferred than Student’s t-test under the heteroscedasticity

framework and unbalanced designs; Welch’s t-test can be even extensively used in

nonparametric situations as long as the number of observations are adequate and they don’t

origin from two populations with extremely distinct variances.

4.3.2 Power analysis

4.3.2.1 Empirical powers under different assumptions

In general, the power of hypothesis tests improves either with the increasing sample sizes or

with the increasing effect sizes (Tables 4.3 - 4.4, see detailed results in Tables 4.17 - 4.20 in

appendix). This is reasonable, if the effect size is small, then we need more observations to

find the significance; if the effect size is large, then we are able to detect a significant effect

without requiring large number of observations (e.g. with total observations less than 50).

When the assumptions of normality and homogeneity of variances are met (Assumption 1),

Student’s t-test is more powerful in detecting the mean differences in two groups, but the

difference between the two t-tests is extremely small (about 0.1% on average) (Table 4.3).

When the assumption of equal variances in two groups is violated, Welch’s t-test loses more

power and the difference in the power rates between the two t-tests increases to 0.5% on

average (Table 4.3). When we are working with two Skew-normal distributed groups with

equal variances (Assumption 3), Mann-Whitney U test shows more power than t-tests

especially under small samples. Besides, Welch’s t-test appears more powerful than Student’s

t-test to detect the difference between two group means (Table 4.4). When the assumptions of

normality and homogeneity of variances are both violated (Assumption 4), Welch’s t-test

possesses constantly higher power than Student’s t-test and is even more powerful than Mann-

Whitney U test, which confirms the potential robustness of Welch’s t-test in non-parametric

situations. Note, our simulation study doesn’t show distinct advantages of Mann-Whitney U

! of !34 44

test under non-parametric conditions especially when the assumptions of normality and

homogeneity of variances are both violated. This reminds us that the application of Mann-

Whitney U test still have restrictions and it may not be an optimal choice in Assumption 4.

Note. Observed effect sizes are calculated from Hedge’s ! (formula (3)). Selected effect size values ranging

from 0 to 2 are categorised into three intervals following Cohen’s criterion (described in 2.1). Mean(power) is the averaged observed power rates in each category for the specific combination between effect size and sample

size. Difference is calculated as the reduction from the power of Student’s t-test to Welch’s t-test. A positive value of “Difference” suggests a larger power for Student’s t-test, while a negative value may indicate the occasion where Welch’s t-test is more powerful.

gs

! of !35 44

Table 4.3: Summary of simulation results of powers of Student’s t-test, Welch t-test and Mann-Whitney U

Assumption Total sample size

Observed effect size

Mean(power) Difference between two

t-testsStudent’s t-test Welch’s t-test

1

<50

(0, 0.2) 0.0576 0.0569 0.0007

[0.2, 0.8) 0.3813 0.3753 0.0060

>0.8 0.9808 0.9793 0.0015

[50, 100)

<50 0.0641 0.0640 0.0002

[50, 100) 0.6060 0.6010 0.0050

>100 0.9998 0.9998 0.0000

>100

<50 0.0839 0.0837 0.0002

[50, 100) 0.8757 0.8750 0.0007

>100 1.0000 1.0000 0.0000

2

<50

(0, 0.2) 0.0679 0.0576 0.0104

[0.2, 0.8) 0.2948 0.2878 0.0071

>0.8 0.9270 0.9182 0.0087

[50, 100)

<50 0.0821 0.0643 0.0178

[50, 100) 0.4805 0.4866 -0.0061

>100 0.9900 0.9834 0.0065

>100

<50 0.0994 0.0853 0.0141

[50, 100) 0.7910 0.8038 -0.0128

>100 0.9999 0.9999 0.0000

Note. Observed effect sizes are calculated from Hedge’s ! (formula (3)). Selected effect size values ranging

from 0 to 2 are categorised into three intervals following Cohen’s criterion (described in 2.1). Mean(power) is the averaged observed power rates in each category for the specific combination between effect size and sample

size. Difference is calculated as the reduction from the power of Student’s t-test to Welch’s t-test. A positive value of “Difference” suggests a larger power for Student’s t-test, while a negative value may indicate the occasion where Welch’s t-test is more powerful.

gs

! of !36 44

Table 4.4: Summary of simulation results of powers of Student’s t-test, Welch t-test and Mann-Whiteney U test under skewed-normal distributions

Assumption Distribution Total sample size

Observed effect size

Mean(power) Difference between t-

testsStudent’s t-test

Welch’s t-test

Mann-Whitney U

3

Skew-normal

<50 >0.8 0.6889 0.6821 0.6912 0.0067

[50, 100) >0.8 0.9261 0.9273 0.9402 -0.0012

>100 >0.8 0.9990 0.9991 0.9995 0.0000

4

<50[50, 100) 0.3583 0.4038 0.4476 -0.0455

>100 0.8244 0.8583 0.8484 -0.0338

[50, 100)[50, 100) 0.5431 0.6685 0.7384 -0.1255

>100 0.9837 0.9934 0.9930 -0.0097

>100[50, 100) 0.9020 0.9273 0.9734 -0.0253

>100 1.0000 1.0000 1.0000 0.0000

SAS

<50[50, 100) 0.3423 0.3659 0.3054 -0.0237

>100 0.8559 0.8753 0.8090 -0.0194

[50, 100)[50, 100) 0.6001 0.6426 0.5270 -0.0425

>100 0.9949 0.9968 0.9827 -0.0019

>100[50, 100) 0.9380 0.9423 0.8776 -0.0044

>100 1.0000 1.0000 1.0000 0.0000

Gamma

<50[50, 100) 0.5044 0.5700 0.5808 -0.0656

>100 0.8175 0.8257 0.8215 -0.0082

[50, 100)[50, 100) 0.7736 0.8264 0.9124 -0.0528

>100 0.9500 0.9684 0.9383 -0.0183

>100[50, 100) 0.9897 0.9944 0.9997 -0.0047

>100 0.9948 0.9948 0.9980 0.0000

4.3.2.2 Required sample sizes to achieve a power of 80%

Two sets of required sample sizes to achieve a power of 80% for Student’s t-test and Welch’s

t-test are calculated by controlling either the effect sizes (Table 4.21 in appendix) or the real

mean differences in population (Table 4.22 in appendix).

Given the same effect size values, Welch’s t-test requires less observations than Student’s t-

test (Figure 4.8) and this finding is consistent across all configurations of parameters of two

groups (Table 4.21 in appendix). Independent from homogeneity of variances in two groups

and the experimental design (balanced/unbalanced), Welch’s t-test always outperforms

Student’s t-test in terms of the least observations in the control group to gain the same power.

This is consistent with previous findings in Table 4.21 in appendix, the difference in two

group means always needs to be larger in Welch t-test than Student’s t-test if their effect size

estimators have the same values (Cohen’s ! = Shieh’s d), therefore less observations are

required to detect larger mean differences to gain the same power.

To gain a power above 80% to detect the same mean difference, Student’s t-test and Welch’s

t-test require similar sample sizes either under balanced design or assuming equal variances in

two groups (Table 4.22 in appendix). The two t-tests differ in the minimum sample sizes when

we are working with two unbalanced groups without assuming homogeneity of variances.

When there is less/more observations in the group with smaller/bigger standard deviation (e.g.

SDR < 1 and SSR < 1), Welch’s t-test requires less observations (Figure 4.8, Table 4.22 in

appendix); on the contrary, when larger variance is associated with the smaller group (e.g.

SDR < 1, SSR > 1), Student’s t-test is then preferred as it requires less observations to reach

the same power (Figure 4.8, Table 4.22 in appendix).

To sum up, power analysis depicts that Welch’s t-test is slightly less powerful than Student’s t-

test under normality assumption, but it is definitely more powerful than Student’s t-test under

non-parametric conditions especially both assumptions of normality and homogeneity of

variances are violated. The simulation results confirm that Welch’s t-test achieves more

frequently larger power rates than Mann-Whitney U test under both the homoscedasticity and

the heteroscedasticity framework for non-normal distributions.

ds

! of !37 44

! of !38 44

0.1 0.3 0.5 0.8 1Mean.diff

0.1 0.3 0.5 0.8 1 0.1 0.3 0.5 0.8 1

Student’s t-test Welch’s t-test

Figure 4.8: Required sample size to gain a power of 80%

Req

uire

d sa

mpl

e si

ze

Mean.diffEffect size

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500SDR=0.5SSR=0.5

SDR=0.5SSR=0.5

SDR=0.5SSR=2

SDR: Standard Deviation RatioSSR: Sample Size Ratio

5 Discussion

In applied research reporting effect sizes accompanied by the corresponding confidence

intervals has steadily shown their importance in communicating the practical significance of

results (Lakens, 2013). Previous research has highlighted that the choice of effect size

measures and approaches to build the associated confidence intervals is highly dependent on

specific conditions (Algina et al., 2005; Cumming & Finch, 2001; Grissom & Kim, 2001;

Lakens, 2013; Shieh, 2013), but unfortunately, limited studies properly report effect size in

the last several decades (Kotrlik et al., 2011). Therefore, it is of great relevance to compare

different effect size indicators and confidence interval approaches under diverse conditions, in

order to provide a guideline to researchers to choose proper effect size measures and their

confidence intervals.

In the context of group comparisons, Cohen’s ! that relies on the often untenable

assumptions of normality and homogeneity of variances is the dominant effect size measure

used by researchers. However, the consequence of arbitrary use of Cohen’s ! when the

assumptions are violated and the superiority of alternatives to Cohen’s ! under specific

conditions remain vague. This study conducted a comprehensive comparative study through

Monte Carlo simulations on Cohen’s ! and other alternatives across assumptions regarding

the point estimators, confidence intervals and related statistical tests.

Firstly, we explored how Cohen’s ! and other effect size indices react in terms of bias,

consistence and variance across the different assumptions of normality and homogeneity of

variances. Based on a systematic review on proposed effect size indices under different

assumptions (summarised in Table 2.1), we conducted the Monte Carlo simulation study on

proposed effect size estimators including Cohen’s ! , Hedge’s ! , Glass’s ! , Shieh’s d under

normal conditions and Cohen’s ! , Hedge’s ! , Shieh’s d, ! , ! and ! under non-

normal distributions. Simulation results reveal that Shieh’s d is the best parametric effect size

measure which performs mostly comparable lower bias rates to Cohen’s ! and maintains

higher precision across almost all assumptions. The only exception is when the group with

less observations has larger variances under the assumption of normality, because there

Shieh’s d appears to be more biased than Cohen’s ! . On the other hand, the higher bias rates

of Cohen’s ! under the non-normality assumption alert that using classic methods for means

comparisons and assume all is well is evidently unsatisfactory.

ds

ds

ds

ds

ds

ds gs ds

ds gs dMAD dR PSindep

ds

ds

ds

! of !39 44

Notably, our analysis didn’t confirm the expected advantages of recommended effect size

point estimators such as ! , ! and ! when dealing with non-normally distributed

data. One possible explanation is that our configurations in simulations did not include

extremely skewed data where using medians or the techniques of trimming and winsorization

may offer great power to operate outliers. In this aspect, further explorations can be extended

by including more distinct distributions with extremely heavy tails and testing more different

values of the trimming percentages such as 10%, 15% and 30%. Furthermore, it is not

convincing that we can use ! as a default alternative to Cohen’s ! under all non-

parametric cases, because 1) it doesn’t reflect the difference in two group means but in two

distributions and 2) its robustness to non-normality is threatened by heteroscedasticity.

Concisely, it is worthy of attention that different methods are sensitive to different features of

the data and can provide different perspectives that have practical importance. Hence, more

modern techniques with practical advantages have to be developed.

The simulation results provide significant evidences to support the use of Shieh’s d as default

in effect size calculations when we cannot make any assumptions of the data. However,

Cumming (2013) contended two serious deficiencies in its standardizer. Firstly, in the case of

equal-variances and balanced design, Shieh’s d does not reduce to Cohen’s ! as most

commonly defined (actually, ! ). Therefore, it lacks the consistency

with the most familiar version of Cohen’s ! . Secondly, it does not give a readily interpretable

version of Cohen’s ! since the standardizer is dependent on the relative sample sizes, indexed

by ! (i.e., choosing different sample sizes for the experiment, the measurement unit of

! changes). Evidently, further studies need to be done to improve these limitations.

One of the major challenges in the part of simulation study is that it is not feasible to unify the

parameter configurations under normal and non-normal distributions, as the non-normal

distributions such as SAS-normal and Gamma are not defined by the mean and standard

deviation. As a consequence, the behaviour of Cohen’s ! under normality and non-normality

assumptions is not directly comparable. To extend this study, we believe that within-estimator

comparisons by controlling all parameters in different distributions will be a meaningful

move-on step.

After comparing performance of effect size point estimators, we detected the accuracy and

precision of four confidence intervals constructed through parametric and non-parametric

dMAD dR PSindep

PSindep ds

ds

Shieh's d =12

Cohen's ds

ds

ds

qi

Shieh's d

ds

! of !40 44

approaches. Under the assumption of normality, NC method around Shieh’s d is considered as

the optimal method because it always gives the most accurate interval estimations

disregarding experimental designs and sample sizes; while BS BCa interval around Cohen’s

! shows great advantages under non-parametric conditions especially when the observations

are adequate. In terms of precision, NC intervals around Shieh’s d are the narrowest intervals

among all.

Notably, the configurations of the mean difference in our simulations only represent right-

shifted cases. In the analysis, we should also take it into account that some observed

tendencies are related to this fact. For instance, the observed standard deviations for lower

bounds of confidence intervals are always larger than those for up bounds and the difference

increases with the magnitude of location shift (Table 4.5 - 4.16 in appendix), but we would

not assume it as a universal rule. We would expect to observe an opposite trend when the

location of the second distribution is left-shifted.

Lastly, we compared type I error rates and empirical powers (including minimum sample

sizes to obtain certain powers) between Student’s t-test and Welch’s t-test since effect size is

always associated with a hypothesis test. Overall, Welch’s t-test outperforms Student’s t-test

in type I error rates, with a cost of power but less required observations than Student’s t-test to

achieve the same power. Exceptions are consistent to the findings on Shieh’s d when there is

less observations in the group with larger standard deviation. Remarkably, the non-parametric

test (Mann-Whitney U test) performs higher type I error rates under non-normal conditions in

the simulation study possibly due to two reasons: 1) we generated the null hypothesis by

setting the mean difference equals to zero but it is not exactly the null hypothesis for Mann-

Whitney U test. 2) although Mann-Whitney U test allows data as free distributed, it still

assumes they should have the same shape, otherwise it will therefore lead to the rejection of

the assumption of equal distributions (Zimmerman, 2000). As extension to the simulation

study of hypothesis tests, we propose to include more significant levels (e.g., ! = 0.025, 0.01),

power rates (e.g., power = 85%, 90%, 95%) as well as diverse data types.

In summary, this study comprehensively analysed and compared effect size and confidence

interval estimates of Cohen’s ! and other alternatives across the different assumptions of

normality and homogeneity of variances in two independent groups. It contributes to the

effect size literatures by providing a guideline to choose the appropriate effect size estimator

and confidence interval approach to convey quantitative information in applied research.

ds

α

ds

! of !41 44

6 References

Algina, J., Keselman, H. J., & Penfield, R. D. (2005). An alternative to Cohen's standardized

mean difference effect size: A robust parameter and confidence interval in the two

independent groups case. Psychological Methods, 10(3), 317-328. DOI:

10.1037/1082-989X.10.3.317

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian

Journal of Statistics, 12(2), 171-178. Retrieved from http://www.jstor.org/stable/4615982

Charness, G., Gneezy, U. & Kuhn, M. (2012). Experimental methods: Between-subject and

within-subject design. Journal of Economic Behavior and Organization, 81(1), 1-8. DOI:

10.1016/j.jebo.2011.08.009

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences, Academic Press;

Cambridge, Massachusetts. DOI: 10.1016/C2013-0-10517-X

Cumming, G. (2013). Cohen’s d needs to be readily interpretable: comment on Shieh.

Behavior Research Methods, 45(4), 968-971. DOI: 10.3758/s13428-013-0392-4

Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of

confidence intervals that are based on central and noncentral distributions. Educational

and Psychological Measurement, 61(4), 532–574. DOI: 10.1177/0013164401614002

Cumming, G., & Jageman, R.C. (2017). Introduction to the New Statistics: Estimation, Open

Science, and Beyond. Routledge; New York.

Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch’s

t-test instead of Student’s t-test. International Review of Social Psychology, 30(1), 92–101.

DOI: 10.5334/irsp.82

Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers.

Professional Psychology: Research & Practice, 40, 532-538. DOI: 10.1037/a0015808

Glass, G. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher,

5(10), 3-8. Retrieved from http://www.jstor.org/stable/1174772

Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research : a broad practical approach.

Lawrence Erlbaum Associates, Mahwah, N.J.; London

Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate

conceptualization of effect size. Psychological Methods, 6(2), 135-146. DOI:

10.1037/1082-989X.6.2.135

! of !42 44

http://www.jstor.org/stable/4615982


Harrison, D., & Brady, A. (2004). Sample size and power calculations using the noncentral t-

distribution. The State Journal, 4(2), 142-153. DOI: 10.1177/1536867X0400400205

Hedges, L. V., & Olkin, I. (1985). Statistical Methods for Meta-analysis. Academic Press;

Cambridge, Massachusetts. DOI: 10.1016/C2009-0-03396-0

Jones, M., & Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika, 96(4), 761-780.

Retrieved from http://www.jstor.org/stable/27798865

Kelley, K. (2005). The effects of nonnormal distributions on confidence intervals around the

standardized mean difference: Bootstrap and parametric confidence intervals. Educational

and Psychological Measurement, 65(1), 51–69. DOI: 10.1177/0013164404264850

Kelley, K., & Preacher, K. (2012). On effect size. Psychological Methods, 17(2), Jun 2012,

137-152. DOI: 10.1037/a0028086

Kotrlik, J.W., Williams, H.A. & Jabor, M.K. (2011). Reporting and interpreting effect size in

quantitative agricultural education research. Journal of Agricultural Education, 52(1),

132–142. DOI: 10.5032/jae.2011.01132

Kim, S., & Cohen, A. (1998). On the Behrens-Fisher problem: A review. Journal of

Educational and Behavioral Statistics, 23(4), 356-377. Retrieved from http://

www.jstor.org/stable/1165281

Kulinskaya, E. & Staudte, R. G. (2007). Confidence intervals for the standardized effect

arising in the comparison of two normal populations. Statistics in Medicine, 26(14),

2853-71. DOI: 10.1002/sim.2751

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: A

practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4, Article ID 863. DOI:

10.3389/fpsyg.2013.00863

Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: do not use

standard deviation around the mean, use absolute deviation around the median. Journal of

Experimental Social Psychology, 49(4), 764–766. DOI: 10.1016/j.jesp.2013.03.013

Shieh, G. (2013). Confidence intervals and sample size calculations for the standardized mean

difference effect size between two normal populations under heteroscedasticity. Behavior

Research Methods, 45(4), 955-967. DOI: 10.3758/s13428-013-0320-7

! of !43 44


Steiger, J. H., & Fouladi, R. T. (1997). Noncentrality interval estimation and the evaluation of

statistical models. In Harlow, L. L., Mulaik, S. A. & Steiger, J. H. (Eds.)(2016). What if

there were no significance tests? Routledge; New York.

Keselman, H.J., Wilcox, R.R., Othman, A.R., & Fradette, K. (2002). Trimming, transforming

statistics, and bootstrapping: Circumventing the biasing effects of heterescedasticity and

nonnormality. Journal of Modern Applied Statistical Methods, 1(2), 288-309. DOI:

10.22237/jmasm/1036109820

Zimmerman, D.W. (2003). A warning about the large-sample Wilcoxon-Mann-Whitney Test.

Understanding Statistics, 2(4), 267-280. DOI: 10.1207/S15328031US0204_03

! of !44 44

APPENDIX

Figure 1

Table 4.1

Table 4.2

Table 4.3

Table 4.4

Table 4.5

Table 4.6

Table 4.7

Table 4.8

Table 4.9

Table 4.10

Table 4.11

Table 4.12

Table 4.13

Table 4.14

Table 4.15

Table 4.16

Table 4.17

Table 4.18

Table 4.19

Table 4.20

Table 4.21

Table 4.22

Syntex 1

Syntex 2

Distributions in the simulation study

Bias rates and variances of ES estimators under Assumption 1 (assume ! =! =1, ! = 0)

Bias rates and variances of ES estimators under Assumption 2 (assume ! =1, ! = 0)

Bias rates and variances of ES estimators under Assumption 3 (assume ! =! =1, ! = 0)

Bias rates and variances of ES estimators under Assumption 4

NC and BS BCa CIs under Normal distributions (assume ! =! =10, ! =! =1)

NC and BS BCa CIs under Normal distributions (assume ! =! =10, ! =2, ! =1)

NC and BS BCa CIs under Normal distributions (assume ! =20, ! =10, ! =! =1)

NC and BS BCa CIs under Normal distributions (assume ! =20, ! =10, ! =2, ! =1)

NC and BS BCa CIs under Normal distributions (assume ! =75, ! =50, ! =! =1)

NC and BS BCa CIs under Normal distributions (assume ! =75, ! =50, ! =2, ! =1)

NC and BS BCa CIs under Skew-normal distributions (assume ! =! =10, ! =! =1)

NC and BS BCa CIs under Skew-normal distributions (assume ! =! =10, ! =2, ! =1)

NC and BS BCa CIs under Skew-normal distributions (assume ! =20, ! =10, ! =! =1)

NC and BS BCa CIs under Skew-normal distributions (assume ! =20, ! =10, ! =2, ! =1)

NC and BS BCa CIs under Skew-normal distributions (assume ! =75, ! =50, ! =! =1)

NC and BS BCa CIs under Skew-normal distributions (assume ! =75, ! =50, ! =2, ! =1)

Type I error rates and powers in Student’s and Welch’s t-tests (Normal distributions)

Type I error rates and powers in Student’s and Welch’s t-tests, and Mann-Whitney U

tests (Skew-normal distributions)


tests (SAS-normal distributions)


tests (Gamma distributions)

Required sample sizes in t-tests to achieve 80% power (under constant effect sizes)

Required sample sizes in t-tests to achieve 80% power (under constant mean difference)

R-functions to obtain confidence intervals

R-functions to calculate the required sample size

σ1 σ2 μ2

σ2 μ2

σ1 σ2 μ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

n1 n2 σ1 σ2

Figure 1 Distributions in the simulation study

Note.

(a) Normal densities with SD=1 and increasing mean differences with right location shift,

mean difference = 0.1, 0.5, 0.8, 1.5, 2, 5;

(b) Normal densities with mean differences = 2 with varying SD ratio = 0.5 and 1.5;

(c) Skew-normal densities with location shift = -1 and -2, skewness = 2 and 3 respectively;

(d) Sinh-Arcsinh normalised densities ! with ! = -0.5 and 1;

(e) Gamma densities with shape = 2 and scale = 0.5 and 1.

σϵ,1 fϵ,1(σϵ,1x + μϵ,1) ϵ

Page ! of !1 52

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−10 −5 0 5 10

0.0

0.1

0.2

0.3

0.4

(a) Normal densities with equal variances

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

(b) Normal densities with unequal variances

sd=0.5sd=1.5

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−4 −2 0 2 4

0.0

0.2

0.4

(c) Skew−normal densities

skewness=2skewness=3

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

(d) Sinh−Arcsinh normal densities

ε=1ε=−0.5

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

(e) Gamma densities

scale=1scale=0.5

Table 4.1: Bias rates and variances of ES estimators under Assumption 1 (assume σ1=σ2=1, µ2 = 0)

Shieh’s d

Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance

10 10

0.1 0.0349 0.2260 0.0347 0.2073 0.0875 0.2592 0.0349 0.0565

0.5 0.0403 0.2345 0.0402 0.2150 0.0890 0.2786 0.0403 0.0586

0.8 0.0449 0.2470 0.0447 0.2265 0.0949 0.3138 0.0449 0.0617

1.5 0.0407 0.3013 0.0406 0.2763 0.0906 0.4511 0.0407 0.0753

2.0 0.0437 0.3653 0.0436 0.3350 0.0935 0.6096 0.0437 0.0913

5.0 0.0438 1.0911 0.0437 1.0006 0.0947 2.4913 0.0438 0.2728

15 10

0.1 0.0220 0.1832 0.0219 0.1713 0.0859 0.2169 0.0286 0.0447

0.5 0.0360 0.1900 0.0359 0.1776 0.0968 0.2371 0.0418 0.0465

0.8 0.0365 0.1996 0.0364 0.1867 0.0963 0.2726 0.0420 0.0493

1.5 0.0350 0.2391 0.0350 0.2235 0.0950 0.4119 0.0407 0.0604

2.0 0.0346 0.2867 0.0345 0.2681 0.0958 0.5697 0.0406 0.0736

5.0 0.0346 0.8322 0.0345 0.7782 0.0950 2.4082 0.0404 0.2263

20 10

0.1 0.0278 0.1628 0.0277 0.1541 0.0859 0.1942 0.0383 0.0375

0.5 0.0250 0.1657 0.0250 0.1569 0.0932 0.2146 0.0399 0.0388

0.8 0.0308 0.1729 0.0308 0.1637 0.0976 0.2483 0.0453 0.0414

1.5 0.0273 0.2067 0.0273 0.1956 0.0916 0.3864 0.0408 0.0525

2.0 0.0259 0.2418 0.0259 0.2289 0.0916 0.5385 0.0400 0.0646

5.0 0.0284 0.6780 0.0283 0.6418 0.0946 2.4020 0.0424 0.2120

20 20

0.1 0.0171 0.1052 0.0171 0.1010 0.0375 0.1116 0.0171 0.0263

0.5 0.0178 0.1084 0.0177 0.1041 0.0392 0.1190 0.0178 0.0271

0.8 0.0214 0.1149 0.0214 0.1104 0.0432 0.1326 0.0214 0.0287

1.5 0.0206 0.1387 0.0206 0.1333 0.0418 0.1845 0.0206 0.0347

2.0 0.0207 0.1626 0.0207 0.1562 0.0422 0.2397 0.0207 0.0407

5.0 0.0208 0.4711 0.0207 0.4526 0.0431 0.9233 0.0208 0.1178

30 20

0.1 0.0094 0.0873 0.0094 0.0846 0.0354 0.0939 0.0121 0.0211

0.5 0.0135 0.0897 0.0135 0.0869 0.0384 0.1007 0.0159 0.0218

0.8 0.0152 0.0942 0.0152 0.0913 0.0413 0.1140 0.0179 0.0231

"n1

Cohen’s !ds Glass’s !ds!μ1 − μ2

Hedges’s "gs"n2

Page ! of !2 52

30 20

1.5 0.0161 0.1120 0.0161 0.1085 0.0423 0.1667 0.0189 0.0281

2.0 0.0164 0.1327 0.0164 0.1286 0.0421 0.2228 0.0191 0.0339

5.0 0.0164 0.3713 0.0164 0.3597 0.0420 0.9027 0.0190 0.1006

40 20

0.1 0.0238 0.0777 0.0238 0.0757 0.0563 0.0842 0.0321 0.0176

0.5 0.0085 0.0799 0.0085 0.0778 0.0364 0.0918 0.0149 0.0183

0.8 0.0150 0.0832 0.0150 0.0811 0.0444 0.1047 0.0220 0.0195

1.5 0.0127 0.0977 0.0127 0.0952 0.0410 0.1556 0.0193 0.0244

2.0 0.0131 0.1152 0.0131 0.1122 0.0415 0.2138 0.0197 0.0302

5.0 0.0139 0.3083 0.0138 0.3003 0.0431 0.8944 0.0208 0.0954

50 50

0.1 0.0036 0.0409 0.0036 0.0402 0.0122 0.0418 0.0036 0.0102

0.5 0.0067 0.0420 0.0067 0.0414 0.0144 0.0443 0.0067 0.0105

0.8 0.0072 0.0440 0.0072 0.0434 0.0151 0.0485 0.0072 0.0110

1.5 0.0072 0.0526 0.0072 0.0518 0.0150 0.0662 0.0072 0.0131

2.0 0.0076 0.0626 0.0076 0.0617 0.0158 0.0866 0.0076 0.0157

5.0 0.0080 0.1731 0.0080 0.1705 0.0154 0.3170 0.0080 0.0433

75 50

0.1 0.0201 0.0338 0.0201 0.0334 0.0287 0.0347 0.0208 0.0081

0.5 0.0066 0.0349 0.0066 0.0345 0.0163 0.0374 0.0078 0.0084

0.8 0.0066 0.0362 0.0066 0.0358 0.0162 0.0414 0.0076 0.0088

1.5 0.0054 0.0432 0.0054 0.0427 0.0151 0.0594 0.0065 0.0108

2.0 0.0060 0.0507 0.0060 0.0501 0.0150 0.0786 0.0069 0.0128

5.0 0.0061 0.1381 0.0061 0.1364 0.0159 0.3116 0.0072 0.0374

100 50

0.1 0.0027 0.0303 0.0027 0.0300 0.0081 0.0312 0.0001 0.0068

0.5 0.0055 0.0313 0.0055 0.0310 0.0162 0.0340 0.0082 0.0071

0.8 0.0061 0.0326 0.0061 0.0322 0.0168 0.0384 0.0088 0.0075

1.5 0.0056 0.0387 0.0056 0.0383 0.0161 0.0567 0.0081 0.0095

2.0 0.0056 0.0445 0.0056 0.0441 0.0163 0.0757 0.0083 0.0115

5.0 0.0050 0.1160 0.0050 0.1149 0.0162 0.3064 0.0079 0.0355

Shieh’s d

Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance"n1

Cohen’s !ds Glass’s !ds!μ1 − μ2

Hedges’s "gs"n2

Page ! of !3 52

Table 4.2: Bias rates and variances of ES estimators under Assumption 2 (assuming σ2=1, µ2 = 0)

Shieh’s d

Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance

10 10

0.5

0.1 0.0574 0.2342 0.0573 0.2147 0.0914 0.1608 0.0574 0.0585

0.5 0.0621 0.2515 0.0620 0.2306 0.0974 0.1823 0.0621 0.0629

0.8 0.0626 0.2840 0.0625 0.2604 0.0972 0.2174 0.0626 0.0710

1.5 0.0584 0.4046 0.0583 0.3710 0.0930 0.3592 0.0584 0.1011

2.0 0.0589 0.5424 0.0588 0.4974 0.0939 0.5211 0.0589 0.1356

5.0 0.0593 2.1453 0.0592 1.9675 0.0939 2.3901 0.0593 0.5363

1.5

0.1 0.0884 0.2284 0.0883 0.2095 0.1274 0.4205 0.0884 0.0571

0.5 0.0430 0.2341 0.0429 0.2147 0.0873 0.4387 0.0430 0.0585

0.8 0.0493 0.2464 0.0492 0.2260 0.0919 0.4729 0.0493 0.0616

1.5 0.0485 0.2824 0.0484 0.2590 0.0923 0.6158 0.0485 0.0706

2.0 0.0508 0.3270 0.0507 0.2999 0.0959 0.7740 0.0508 0.0818

5.0 0.0508 0.8301 0.0507 0.7613 0.0945 2.6408 0.0508 0.2075

15 10

0.5

0.1 0.0443 0.2451 0.0442 0.2292 0.0895 0.1508 0.0608 0.0477

0.5 0.0504 0.2589 0.0504 0.2420 0.0957 0.1703 0.0668 0.0515

0.8 0.0497 0.2877 0.0497 0.2690 0.0952 0.2058 0.0663 0.0589

1.5 0.0484 0.3997 0.0484 0.3738 0.0941 0.3499 0.0650 0.0883

2.0 0.0466 0.5176 0.0466 0.4840 0.0916 0.5010 0.0629 0.1193

5.0 0.0480 1.9608 0.0479 1.8334 0.0942 2.3890 0.0648 0.5004

1.5

0.1 0.0384 0.1561 0.0383 0.1460 0.0961 0.3223 0.0348 0.0436

0.5 0.0374 0.1610 0.0373 0.1506 0.0944 0.3445 0.0335 0.0448

0.8 0.0377 0.1687 0.0376 0.1577 0.0946 0.3821 0.0339 0.0468

1.5 0.0376 0.1926 0.0375 0.1801 0.0949 0.5213 0.0339 0.0528

2.0 0.0386 0.2247 0.0385 0.2101 0.0958 0.6749 0.0349 0.0609

5.0 0.0381 0.5616 0.0380 0.5251 0.0951 2.5562 0.0342 0.1460

0.5

0.1 0.0325 0.2555 0.0325 0.2419 0.0840 0.1456 0.0604 0.0402

0.5 0.0432 0.2711 0.0432 0.2566 0.0974 0.1676 0.0728 0.0445

0.8 0.0418 0.2947 0.0417 0.2790 0.0972 0.2041 0.0719 0.0513

"n1 "n2

Hedges’s !gs"σ1σ2

!μ1 − μ2

Glass’s !dsCohen’s !ds

Page ! of !4 52

20 10

0.5

1.5 0.0407 0.3961 0.0407 0.3750 0.0948 0.3451 0.0702 0.0786

2.0 0.0392 0.4991 0.0391 0.4724 0.0917 0.4931 0.0677 0.1071

5.0 0.0410 1.8138 0.0410 1.7169 0.0951 2.3741 0.0705 0.4678

1.5

0.1 0.0200 0.1250 0.0200 0.1184 0.0819 0.2757 0.0195 0.0363

0.5 0.0349 0.1273 0.0349 0.1205 0.0976 0.2934 0.0344 0.0368

0.8 0.0322 0.1325 0.0322 0.1254 0.0973 0.3331 0.0326 0.0385

1.5 0.0315 0.1533 0.0315 0.1452 0.0961 0.4776 0.0316 0.0445

2.0 0.0304 0.1745 0.0303 0.1652 0.0930 0.6291 0.0300 0.0505

5.0 0.0301 0.4316 0.0300 0.4086 0.0925 2.4589 0.0297 0.1238

20 20

0.5

0.1 0.0316 0.1074 0.0316 0.1032 0.0458 0.0698 0.0316 0.0269

0.5 0.0270 0.1160 0.0270 0.1114 0.0412 0.0783 0.0270 0.0290

0.8 0.0273 0.1273 0.0273 0.1223 0.0414 0.0901 0.0273 0.0318

1.5 0.0276 0.1793 0.0276 0.1723 0.0420 0.1426 0.0276 0.0448

2.0 0.0273 0.2361 0.0272 0.2268 0.0416 0.1999 0.0273 0.0590

5.0 0.0276 0.9078 0.0276 0.8721 0.0421 0.8808 0.0276 0.2270

1.5

0.1 0.0287 0.1061 0.0286 0.1020 0.0482 0.1816 0.0287 0.0265

0.5 0.0248 0.1090 0.0248 0.1047 0.0421 0.1889 0.0248 0.0272

0.8 0.0272 0.1137 0.0272 0.1092 0.0465 0.2043 0.0272 0.0284

1.5 0.0227 0.1289 0.0227 0.1238 0.0404 0.2527 0.0227 0.0322

2.0 0.0243 0.1480 0.0243 0.1422 0.0413 0.3086 0.0243 0.0370

5.0 0.0243 0.3639 0.0243 0.3496 0.0428 0.9861 0.0243 0.0910

30 20

0.5

0.1 0.0245 0.1136 0.0245 0.1100 0.0436 0.0655 0.0319 0.0217

0.5 0.0230 0.1209 0.0230 0.1171 0.0419 0.0733 0.0303 0.0236

0.8 0.0233 0.1334 0.0233 0.1292 0.0421 0.0862 0.0306 0.0267

1.5 0.0235 0.1810 0.0235 0.1754 0.0426 0.1383 0.0309 0.0388

2.0 0.0226 0.2310 0.0226 0.2238 0.0414 0.1929 0.0299 0.0515

5.0 0.0235 0.8638 0.0235 0.8369 0.0427 0.8807 0.0310 0.2119

0.1 0.0129 0.0745 0.0129 0.0722 0.0356 0.1397 0.0107 0.0208

Shieh’s d

Bias rate Variance Bias rate Variance Bias rate Variance Bias rate Variance"n1 "n2


!μ1 − μ2


Page ! of !5 52

1.5

0.5 0.0176 0.0759 0.0176 0.0735 0.0409 0.1463 0.0156 0.0211

0.8 0.0174 0.0799 0.0174 0.0774 0.0419 0.1621 0.0157 0.0222

1.5 0.0171 0.0915 0.0171 0.0886 0.0398 0.2113 0.0150 0.0250

2.0 0.0171 0.1033 0.0170 0.1001 0.0402 0.2654 0.0151 0.0280

5.0 0.0177 0.2560 0.0177 0.2480 0.0406 0.9376 0.0156 0.0661

40 20

0.5

0.1 0.0029 0.1198 0.0029 0.1167 0.0239 0.0632 0.0148 0.0182

0.5 0.0188 0.1259 0.0188 0.1226 0.0412 0.0707 0.0317 0.0198

0.8 0.0187 0.1376 0.0187 0.1340 0.0409 0.0836 0.0314 0.0227

1.5 0.0199 0.1817 0.0199 0.1770 0.0420 0.1355 0.0325 0.0340

2.0 0.0194 0.2285 0.0194 0.2226 0.0411 0.1902 0.0318 0.0459

5.0 0.0190 0.8161 0.0190 0.7950 0.0408 0.8699 0.0315 0.1943

1.5

0.1 0.0211 0.0599 0.0211 0.0583 0.0511 0.1185 0.0217 0.0173

0.5 0.0135 0.0617 0.0135 0.0601 0.0397 0.1270 0.0130 0.0178

0.8 0.0155 0.0638 0.0155 0.0621 0.0430 0.1397 0.0154 0.0184

1.5 0.0152 0.0724 0.0152 0.0705 0.0418 0.1921 0.0147 0.0208

2.0 0.0146 0.0814 0.0146 0.0793 0.0417 0.2471 0.0144 0.0234

5.0 0.0149 0.1989 0.0149 0.1937 0.0426 0.9392 0.0149 0.0573

50 50

0.5

0.1 0.0086 0.0412 0.0086 0.0405 0.0137 0.0261 0.0086 0.0103

0.5 0.0086 0.0437 0.0086 0.0430 0.0137 0.0286 0.0086 0.0109

0.8 0.0096 0.0490 0.0096 0.0482 0.0148 0.0334 0.0096 0.0122

1.5 0.0104 0.0667 0.0104 0.0657 0.0155 0.0505 0.0104 0.0167

2.0 0.0105 0.0871 0.0105 0.0858 0.0155 0.0698 0.0105 0.0218

5.0 0.0105 0.3340 0.0105 0.3289 0.0157 0.3049 0.0105 0.0835

1.5

0.1 0.0065 0.0409 0.0065 0.0403 0.0127 0.0677 0.0065 0.0102

0.5 0.0085 0.0420 0.0085 0.0413 0.0146 0.0707 0.0085 0.0105

0.8 0.0097 0.0431 0.0097 0.0424 0.0159 0.0742 0.0097 0.0108

1.5 0.0094 0.0496 0.0094 0.0488 0.0162 0.0929 0.0094 0.0124

2.0 0.0084 0.0561 0.0084 0.0553 0.0157 0.1124 0.0084 0.0140

Shieh’s d



!μ1 − μ2


Page ! of !6 52

5.0 0.0087 0.1345 0.0087 0.1324 0.0155 0.3422 0.0087 0.0336

75 50

0.5

0.1 0.0081 0.0434 0.0081 0.0429 0.0150 0.0243 0.0108 0.0082

0.5 0.0061 0.0466 0.0061 0.0460 0.0128 0.0272 0.0087 0.0090

0.8 0.0076 0.0502 0.0076 0.0496 0.0144 0.0312 0.0103 0.0099

1.5 0.0088 0.0684 0.0088 0.0675 0.0157 0.0492 0.0115 0.0144

2.0 0.0085 0.0873 0.0085 0.0862 0.0152 0.0680 0.0112 0.0191

5.0 0.0087 0.3204 0.0087 0.3165 0.0153 0.2997 0.0113 0.0767

1.5

0.1 0.0023 0.0290 0.0023 0.0287 0.0109 0.0521 0.0016 0.0081

0.5 0.0047 0.0298 0.0047 0.0294 0.0138 0.0549 0.0041 0.0083

0.8 0.0068 0.0309 0.0068 0.0305 0.0150 0.0591 0.0059 0.0086

1.5 0.0060 0.0353 0.0060 0.0349 0.0145 0.0770 0.0052 0.0097

2.0 0.0060 0.0400 0.0060 0.0395 0.0141 0.0964 0.0051 0.0109

5.0 0.0071 0.0961 0.0071 0.0949 0.0164 0.3275 0.0065 0.0248

100 50

0.5

0.1 0.0053 0.0461 0.0053 0.0456 0.0132 0.0235 0.0099 0.0069

0.5 0.0073 0.0487 0.0073 0.0482 0.0153 0.0262 0.0120 0.0075

0.8 0.0065 0.0526 0.0065 0.0521 0.0147 0.0304 0.0113 0.0085

1.5 0.0077 0.0693 0.0077 0.0686 0.0160 0.0481 0.0126 0.0125

2.0 0.0082 0.0879 0.0082 0.0870 0.0163 0.0677 0.0129 0.0170

5.0 0.0073 0.3069 0.0073 0.3038 0.0153 0.3001 0.0120 0.0703

1.5

0.1 0.0061 0.0236 0.0061 0.0234 0.0160 0.0446 0.0060 0.0068

0.5 0.0048 0.0240 0.0048 0.0237 0.0147 0.0470 0.0047 0.0069

0.8 0.0053 0.0248 0.0053 0.0246 0.0159 0.0516 0.0054 0.0072

1.5 0.0059 0.0281 0.0059 0.0279 0.0159 0.0692 0.0058 0.0081

2.0 0.0057 0.0323 0.0057 0.0319 0.0156 0.0891 0.0056 0.0092

5.0 0.0054 0.0751 0.0054 0.0744 0.0156 0.3190 0.0054 0.0215

Shieh’s d



!μ1 − μ2


Page ! of !7 52

Table 4.3: Bias rates and variances of ES estimators under Assumption 3 (assuming σ1=σ2=1, µ2 = 0 in Skew-normal distribution)

Shieh’s d

Bias rate Var. Bias

rate Var. Bias rate Var. Bias


rate Var.

10 10 -1 0.0708 0.3170 0.0707 0.2908 0.0708 0.0793 0.2589 1.0999 0.0763 0.4445 0.0001 0.0118

15 10 -1 0.0567 0.2530 0.0567 0.2366 0.0546 0.0590 0.2633 1.0108 0.0631 0.3468 -0.0011 0.0094

20 10 -1 0.0436 0.2172 0.0435 0.2056 0.0483 0.0483 0.2721 0.9442 0.0507 0.2865 0.0017 0.0084

20 20 -1 0.0321 0.1450 0.0321 0.1393 0.0321 0.0363 0.1125 0.3592 0.0335 0.1945 -0.0002 0.0057

30 20 -1 0.0248 0.1181 0.0248 0.1144 0.0238 0.0274 0.1155 0.3264 0.0248 0.1542 0.0013 0.0046

40 20 -1 0.0219 0.1028 0.0219 0.1002 0.0241 0.0226 0.1158 0.3122 0.0222 0.1329 -0.0002 0.0041

50 50 -1 0.0112 0.0551 0.0112 0.0542 0.0112 0.0138 0.0406 0.1158 0.0097 0.0719 0.0012 0.0023

75 50 -1 0.0098 0.0452 0.0098 0.0447 0.0092 0.0104 0.0420 0.1016 0.0099 0.0584 0.0003 0.0018

100 50 -1 0.0087 0.0395 0.0087 0.0391 0.0093 0.0086 0.0455 0.0954 0.0084 0.0501 -0.0001 0.0016

"n1

"PSindep"dMADHedges’s !gs"n2 !μ1 − μ2

"dRCohen’s !ds

Page ! of !8 52

Table 4.4: Bias rates and variances of ES estimators under Assumption 4

Shieh’s d

Bias rate Var. Bias



rate Var.

Skew-normal

10 10 -2 0.1142 0.4989 0.1141 0.4575 0.1142 0.1247 0.2512 3.4940 0.1288 0.7297 -0.0004 0.0108

15 10 -2 0.0832 0.3118 0.0831 0.2915 0.0679 0.0848 0.2609 3.3201 0.0950 0.4450 0.0008 0.0076

20 10 -2 0.0672 0.2244 0.0672 0.2124 0.0504 0.0640 0.2725 3.0968 0.0739 0.3166 0.0001 0.0059

20 20 -2 0.0545 0.2235 0.0545 0.2147 0.0545 0.0559 0.1086 1.1569 0.0584 0.3147 0.0013 0.0054

30 20 -2 0.0401 0.1408 0.0401 0.1364 0.0327 0.0388 0.1090 1.0043 0.0425 0.1946 0.0008 0.0037

40 20 -2 0.0309 0.1023 0.0309 0.0997 0.0231 0.0295 0.1148 0.9272 0.0350 0.1391 0.0001 0.0029

50 50 -2 0.0201 0.0818 0.0201 0.0805 0.0201 0.0204 0.0387 0.3662 0.0193 0.1114 0.0012 0.0021

75 50 -2 0.0155 0.0533 0.0155 0.0527 0.0126 0.0148 0.0386 0.3128 0.0166 0.0735 0.0004 0.0015

100 50 -2 0.0121 0.0394 0.0121 0.0390 0.0090 0.0115 0.0445 0.2960 0.0123 0.0529 0.0008 0.0012

10 10 -1 0.1468 0.3356 0.1467 0.3078 0.1468 0.0839 0.2148 1.8313 0.1535 0.4404 -0.0008 0.0158

15 10 -1 0.1065 0.2088 0.1064 0.1952 0.0817 0.0575 0.2527 1.5430 0.1192 0.2718 -0.0005 0.0110

20 10 -1 0.0859 0.1512 0.0859 0.1431 0.0570 0.0436 0.2522 1.2746 0.1017 0.1935 -0.0015 0.0087

20 20 -1 0.0763 0.1509 0.0763 0.1449 0.0763 0.0377 0.1022 0.6094 0.0759 0.1911 -0.0028 0.0078

30 20 -1 0.0497 0.0960 0.0497 0.0930 0.0381 0.0269 0.1070 0.4775 0.0568 0.1219 0.0003 0.0054

40 20 -1 0.0416 0.0711 0.0415 0.0692 0.0274 0.0207 0.1077 0.3962 0.0482 0.0887 -0.0008 0.0042

50 50 -1 0.0263 0.0570 0.0263 0.0561 0.0263 0.0142 0.0364 0.1995 0.0249 0.0696 0.0006 0.0031

75 50 -1 0.0216 0.0362 0.0216 0.0357 0.0170 0.0103 0.0358 0.1514 0.0230 0.0444 -0.0008 0.0021

100 50 -1 0.0170 0.0273 0.0170 0.0271 0.0113 0.0081 0.0400 0.1280 0.0183 0.0334 -0.0001 0.0017

Sinh-Arcsinh normal

10 10 1.6 0.0297 0.1748 0.0296 0.1603 0.0297 0.0437 0.3803 1.6376 0.0871 0.2517 -0.0003 0.0097

15 10 1.6 0.0245 0.1206 0.0245 0.1128 0.0193 0.0360 0.3415 1.3428 0.0665 0.1696 -0.0004 0.0079

20 10 1.6 0.0212 0.0979 0.0211 0.0927 0.0184 0.0323 0.3292 1.2347 0.0537 0.1351 -0.0003 0.0069

20 20 1.6 0.0144 0.0790 0.0144 0.0759 0.0144 0.0198 0.1696 0.5076 0.0422 0.1062 0.0002 0.0046

30 20 1.6 0.0127 0.0558 0.0127 0.0541 0.0101 0.0168 0.1498 0.4281 0.0316 0.0738 0.0002 0.0037

40 20 1.6 0.0119 0.0462 0.0119 0.0450 0.0098 0.0153 0.1425 0.3838 0.0275 0.0609 0.0001 0.0034

50 50 1.6 0.0055 0.0302 0.0055 0.0297 0.0055 0.0075 0.0634 0.1621 0.0164 0.0394 0.0000 0.0018

"n1

"PSindep"dMADHedges’s !gs"n2

" , " , "σ1 = 2 σ2 = 1 μ2 = (0, − 1)

!μ1 − μ2

"dR

" , "(σ1, σ2) = {(1.6,1); (1.15,1); (1.6,1.15)} μ2 = (0,0, − 0.7)

Cohen’s !ds

Page ! of !9 52

75 50 1.6 0.0050 0.0219 0.0050 0.0216 0.0038 0.0066 0.0569 0.1380 0.0121 0.0289 0.0000 0.0015

100 50 1.6 0.0048 0.0182 0.0048 0.0180 0.0041 0.0060 0.0516 0.1208 0.0108 0.0237 0.0000 0.0013

10 10 -0.7 0.0142 0.2079 0.0141 0.1907 0.0142 0.0520 0.3926 0.7648 0.0808 0.2689 -0.0009 0.0150

15 10 -0.7 0.0126 0.1610 0.0125 0.1505 0.0200 0.0425 0.3503 0.6677 0.0545 0.2034 -0.0008 0.0123

20 10 -0.7 0.0074 0.1378 0.0073 0.1304 0.0222 0.0364 0.3276 0.5905 0.0410 0.1709 0.0014 0.0110

20 20 -0.7 0.0056 0.0976 0.0056 0.0937 0.0056 0.0244 0.1579 0.2607 0.0347 0.1191 -0.0001 0.0074

30 20 -0.7 0.0047 0.0764 0.0047 0.0740 0.0083 0.0202 0.1494 0.2177 0.0244 0.0928 0.0002 0.0061

40 20 -0.7 0.0054 0.0659 0.0054 0.0642 0.0128 0.0174 0.1411 0.1987 0.0207 0.0798 -0.0003 0.0054

50 50 -0.7 0.0039 0.0375 0.0039 0.0369 0.0039 0.0094 0.0597 0.0868 0.0160 0.0447 -0.0011 0.0029

75 50 -0.7 0.0001 0.0295 0.0001 0.0291 0.0014 0.0078 0.0582 0.0728 0.0064 0.0353 0.0009 0.0024

100 50 -0.7 0.0014 0.0257 0.0014 0.0254 0.0040 0.0067 0.0557 0.0650 0.0076 0.0306 0.0003 0.0021

10 10 2.3 0.0346 0.1596 0.0345 0.1464 0.0346 0.0399 0.3451 1.5958 0.0915 0.2530 -0.0004 0.0045

15 10 2.3 0.0296 0.1207 0.0295 0.1129 0.0235 0.0317 0.3263 1.4340 0.0762 0.1823 -0.0001 0.0036

20 10 2.3 0.0251 0.1032 0.0251 0.0977 0.0222 0.0280 0.3154 1.3081 0.0640 0.1509 -0.0002 0.0032

20 20 2.3 0.0177 0.0746 0.0177 0.0716 0.0177 0.0186 0.1482 0.4827 0.0464 0.1088 0.0002 0.0021

30 20 2.3 0.0144 0.0567 0.0144 0.0549 0.0112 0.0148 0.1391 0.4067 0.0356 0.0803 0.0000 0.0018

40 20 2.3 0.0124 0.0488 0.0124 0.0476 0.0107 0.0130 0.1337 0.3669 0.0314 0.0680 0.0001 0.0015

50 50 2.3 0.0072 0.0287 0.0072 0.0282 0.0072 0.0072 0.0547 0.1466 0.0177 0.0406 0.0001 0.0008

75 50 2.3 0.0056 0.0223 0.0056 0.0220 0.0043 0.0057 0.0529 0.1252 0.0139 0.0307 -0.0001 0.0007

100 50 2.3 0.0044 0.0190 0.0044 0.0188 0.0035 0.0050 0.0457 0.1119 0.0124 0.0260 -0.0002 0.0006

Gamma

10 10 -1 0.1486 0.3816 0.1484 0.3500 0.1486 0.0954 0.2391 1.4308 0.0840 0.5045 0.0016 0.0130

15 10 -1 0.1215 0.2780 0.1214 0.2600 0.0930 0.0675 0.2522 1.2892 0.0690 0.3685 0.0005 0.0097

20 10 -1 0.1023 0.2191 0.1023 0.2074 0.0723 0.0523 0.2581 1.2084 0.0633 0.2935 -0.0014 0.0080

20 20 -1 0.0807 0.1788 0.0807 0.1718 0.0807 0.0447 0.0999 0.4561 0.0388 0.2261 -0.0026 0.0065

30 20 -1 0.0593 0.1275 0.0592 0.1235 0.0449 0.0312 0.1064 0.3935 0.0309 0.1652 0.0006 0.0048

40 20 -1 0.0502 0.1049 0.0502 0.1022 0.0335 0.0248 0.1108 0.3760 0.0241 0.1365 0.0011 0.0040

50 50 -1 0.0304 0.0674 0.0304 0.0663 0.0304 0.0168 0.0354 0.1454 0.0134 0.0837 0.0001 0.0025

Shieh’s d

Bias rate Var. Bias



rate Var."n1

"PSindep

" , "(σ1, σ2) = {(1.4,1); (0.7,1); (1.4,0.7)} μ2 = 1

"dMADHedges’s !gs"n2 !μ1 − μ2

"dRCohen’s !ds

Page ! of !10 52

75 50 -1 0.0247 0.0490 0.0247 0.0484 0.0187 0.0121 0.0389 0.1302 0.0117 0.0630 -0.0002 0.0019

100 50 -1 0.0194 0.0394 0.0194 0.0390 0.0127 0.0095 0.0429 0.1178 0.0094 0.0515 0.0006 0.0016

10 10 -2 0.0699 0.5993 0.0698 0.5496 0.0699 0.1498 0.2696 2.1928 0.0792 1.0108 -0.0034 0.0028

15 10 -2 0.0597 0.5455 0.0596 0.5100 0.0632 0.1156 0.2754 2.1580 0.0645 0.8701 -0.0030 0.0023

20 10 -2 0.0505 0.5064 0.0505 0.4793 0.0618 0.0983 0.2784 2.1929 0.0541 0.7874 0.0042 0.0021

20 20 -2 0.0329 0.2655 0.0329 0.2551 0.0329 0.0664 0.1141 0.6544 0.0336 0.4179 -0.0019 0.0013

30 20 -2 0.0274 0.2447 0.0274 0.2371 0.0282 0.0502 0.1163 0.6409 0.0250 0.3785 0.0008 0.0011

40 20 -2 0.0241 0.2283 0.0241 0.2224 0.0289 0.0423 0.1191 0.6215 0.0236 0.3475 0.0004 0.0010

50 50 -2 0.0124 0.0998 0.0124 0.0983 0.0124 0.0250 0.0429 0.2008 0.0109 0.1501 0.0010 0.0005

75 50 -2 0.0106 0.0931 0.0106 0.0920 0.0107 0.0188 0.0432 0.1963 0.0101 0.1388 0.0013 0.0004

100 50 -2 0.0096 0.0886 0.0096 0.0877 0.0109 0.0159 0.0426 0.1951 0.0077 0.1313 0.0013 0.0004

10 10 1 0.0482 0.1935 0.0481 0.1774 0.0482 0.0484 0.4475 3.3040 0.0765 0.2828 0.0003 0.0128

15 10 1 0.0378 0.1229 0.0377 0.1149 0.0414 0.0413 0.4147 2.9108 0.0452 0.1735 -0.0002 0.0101

20 10 1 0.0328 0.0914 0.0327 0.0865 0.0435 0.0372 0.3989 2.6204 0.0318 0.1261 0.0001 0.0086

20 20 1 0.0275 0.0858 0.0275 0.0824 0.0275 0.0214 0.1944 1.0507 0.0356 0.1214 0.0005 0.0061

30 20 1 0.0190 0.0561 0.0190 0.0543 0.0200 0.0192 0.1825 0.8873 0.0214 0.0767 -0.0002 0.0049

40 20 1 0.0175 0.0433 0.0174 0.0422 0.0223 0.0181 0.1692 0.8144 0.0133 0.0577 -0.0001 0.0042

50 50 1 0.0108 0.0323 0.0108 0.0318 0.0108 0.0081 0.0666 0.3309 0.0111 0.0439 0.0000 0.0024

75 50 1 0.0082 0.0215 0.0082 0.0213 0.0085 0.0074 0.0681 0.2695 0.0065 0.0284 -0.0002 0.0019

100 50 1 0.0071 0.0166 0.0071 0.0164 0.0087 0.0070 0.0629 0.2451 0.0046 0.0219 -0.0001 0.0017

Shieh’s d

Bias rate Var. Bias



rate Var."n1

"PSindep"dMADHedges’s !gs"n2 !μ1 − μ2

"dRCohen’s !ds

Page ! of !11 52

Table 4.5: NC and BS BCa CIs under Normal distributions (assume n1=n2=10, σ1=σ2=1)

True parameter Statistic NC(Shieh’s d)

% of Coverage 0.9497 0.9512 0.9596 0.9188

Mean low bound -0.8885 -0.4449 -0.9571 -0.9283

Median low bound -0.8765 -0.4383 -0.9495 -0.9055

SD low bound 0.4678 0.2337 0.4679 0.5643

Mean up bound 0.8935 0.4474 0.9613 0.9344

Median up bound 0.8810 0.4405 0.9526 0.9102

SD up bound 0.4682 0.2339 0.4677 0.5645

Mean Width 1.7821 0.8923 1.9184 1.8626

Median Width 1.7665 0.8832 1.9053 1.8366

SD Width 0.0398 0.0219 0.1409 0.1608

Empirical power 0.9509 0.9522 0.9596 0.9188

% of Coverage 0.9484 0.9498 0.9571 0.9156

Mean low bound -0.6874 -0.3446 -0.7582 -0.6911

Median low bound -0.6798 -0.3399 -0.7626 -0.6766

SD low bound 0.4608 0.2296 0.4668 0.5529

Mean up bound 1.1001 0.5507 1.1630 1.1772

Median up bound 1.0823 0.5411 1.1436 1.1441

SD up bound 0.4857 0.2432 0.4813 0.5883

Mean Width 1.7875 0.8953 1.9212 1.8682

Median Width 1.7710 0.8855 1.9065 1.8390

SD Width 0.0472 0.0260 0.1494 0.1725

Empirical power 0.0713 0.0691 0.0587 0.1114

% of Coverage 0.9489 0.9505 0.9548 0.9113

Mean low bound -0.3910 -0.1971 -0.4590 -0.3409

Median low bound -0.3935 -0.1968 -0.4771 -0.3404

SD low bound 0.4496 0.2232 0.4711 0.5375

Mean up bound 1.4228 0.7125 1.4751 1.5533

Median up bound 1.3953 0.6977 1.4416 1.5050

True parameter Statistic

"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2

NC(" )ds BS BCa (" )gs

Page ! of !12 52

SD up bound 0.5110 0.2567 0.5051 0.6245

Mean Width 1.8138 0.9096 1.9341 1.8942

Median Width 1.7889 0.8967 1.9127 1.8495

SD Width 0.0727 0.0399 0.1843 0.2217

Empirical power 0.1836 0.1799 0.1562 0.2532

% of Coverage 0.9496 0.9526 0.9440 0.8984

Mean low bound 0.0825 0.0373 0.0394 0.2195

Median low bound 0.0671 0.0313 0.0024 0.1977

SD low bound 0.4451 0.2199 0.5007 0.5314

Mean up bound 1.9880 0.9970 2.0152 2.2085

Median up bound 1.9409 0.9727 1.9592 2.1361

SD up bound 0.5677 0.2865 0.5680 0.7049

Mean Width 1.9055 0.9596 1.9758 1.9890

Median Width 1.8738 0.9414 1.9376 1.9151

SD Width 0.1280 0.0704 0.2727 0.3398

Empirical power 0.5601 0.5546 0.5022 0.6551

% of Coverage 0.9499 0.9549 0.9135 0.8692

Mean low bound 0.9532 0.4660 1.0174 1.2456

Median low bound 0.9213 0.4494 0.9733 1.2047

SD low bound 0.4791 0.2356 0.5852 0.5744

Mean up bound 3.1885 1.6038 3.1591 3.5858

Median up bound 3.1126 1.5652 3.0740 3.4730

SD up bound 0.7226 0.3660 0.7370 0.9173

Mean Width 2.2352 1.1378 2.1416 2.3402

Median Width 2.1913 1.1136 2.0754 2.2329

SD Width 0.2455 0.1342 0.4551 0.5727

Empirical power 0.9867 0.9862 0.9753 0.9933

NC(Shieh’s d)True parameter Statistic

"δ = 2

"δ = 1

NC(" )ds BS BCa (" )gsBS BCa (" )ds

Page ! of !13 52

Table 4.6: NC and BS BCa CIs under Normal distributions (assume n1=n2=10, σ1=2, σ2=1)


% of Coverage 0.9452 0.9500 0.9583 0.9124

Mean low bound -0.8908 -0.4475 -0.9854 -0.9606

Median low bound -0.8765 -0.4383 -0.9765 -0.9274

SD low bound 0.4784 0.2384 0.4820 0.6048

Mean up bound 0.8924 0.4483 0.9870 0.9628

Median up bound 0.8765 0.4383 0.9772 0.9299

SD up bound 0.4786 0.2386 0.4818 0.6049

Mean Width 1.7832 0.8958 1.9724 1.9233

Median Width 1.7665 0.8855 1.9514 1.8838

SD Width 0.0419 0.0267 0.1807 0.2128

Empirical power 0.9464 0.9510 0.9583 0.9124

% of Coverage 0.9456 0.9502 0.9572 0.9103

Mean low bound -0.6840 -0.3449 -0.7814 -0.7082

Median low bound -0.6798 -0.3399 -0.7903 -0.6882

SD low bound 0.4663 0.2307 0.4767 0.5823

Mean up bound 1.1043 0.5543 1.1935 1.2206

Median up bound 1.0823 0.5411 1.1665 1.1752

SD up bound 0.4925 0.2473 0.4939 0.6294

Mean Width 1.7884 0.8992 1.9748 1.9288

Median Width 1.7710 0.8877 1.9520 1.8854

SD Width 0.0492 0.0311 0.1898 0.2274

Empirical power 0.0748 0.0688 0.0586 0.1165

% of Coverage 0.9450 0.9503 0.9532 0.9058

Mean low bound -0.3868 -0.1989 -0.4821 -0.3472

Median low bound -0.3935 -0.2012 -0.5119 -0.3487

SD low bound 0.4565 0.2235 0.4861 0.5629

Mean up bound 1.4284 0.7177 1.5077 1.6145

Median up bound 1.3908 0.6977 1.4551 1.5477


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !14 52

SD up bound 0.5212 0.2643 0.5273 0.6761

Mean Width 1.8152 0.9166 1.9898 1.9617

Median Width 1.7889 0.8989 1.9580 1.8997

SD Width 0.0766 0.0482 0.2370 0.2897

Empirical power 0.1873 0.1751 0.1499 0.2581

% of Coverage 0.9423 0.9484 0.9371 0.8856

Mean low bound 0.0917 0.0329 0.0246 0.2309

Median low bound 0.0716 0.0246 -0.0221 0.2085

SD low bound 0.4622 0.2241 0.5346 0.5592

Mean up bound 2.0017 1.0107 2.0662 2.3130

Median up bound 1.9499 0.9816 1.9953 2.2253

SD up bound 0.5928 0.3034 0.6178 0.7865

Mean Width 1.9100 0.9778 2.0417 2.0821

Median Width 1.8738 0.9570 1.9865 1.9782

SD Width 0.1365 0.0838 0.3545 0.4424

Empirical power 0.5629 0.5407 0.4833 0.6545

% of Coverage 0.9362 0.9464 0.8975 0.8486

Mean low bound 0.9786 0.4541 1.0320 1.2766

Median low bound 0.9347 0.4293 0.9816 1.2210

SD low bound 0.5157 0.2512 0.6377 0.6146

Mean up bound 3.2298 1.6468 3.2708 3.7966

Median up bound 3.1350 1.5988 3.1659 3.6577

SD up bound 0.7845 0.4014 0.8330 1.0475

Mean Width 2.2512 1.1928 2.2388 2.5200

Median Width 2.2003 1.1672 2.1416 2.3746

SD Width 0.2710 0.1572 0.5756 0.7277

Empirical power 0.9866 0.9834 0.9650 0.9936


"δ = 2

"δ = 1


Page ! of !15 52

Table 4.7: NC and BS BCa CIs under Normal distributions (n1=20, n2=10, σ1=σ2=1)


% of Coverage 0.9500 0.9494 0.9498 0.9220

Mean low bound -0.7708 -0.3651 -0.7958 -0.7774

Median low bound -0.7630 -0.3597 -0.7942 -0.7691

SD low bound 0.3986 0.1909 0.4072 0.4525

Mean up bound 0.7640 0.3619 0.7888 0.7697

Median up bound 0.7552 0.3560 0.7849 0.7588

SD up bound 0.3983 0.1907 0.4060 0.4511

Mean Width 1.5348 0.7270 1.5845 1.5471

Median Width 1.5260 0.7212 1.5716 1.5340

SD Width 0.0206 0.0148 0.1787 0.1723

Empirical power 0.9510 0.9507 0.9498 0.9220

% of Coverage 0.9484 0.9489 0.9483 0.9211

Mean low bound -0.5645 -0.2674 -0.5907 -0.5490

Median low bound -0.5616 -0.2647 -0.5916 -0.5427

SD low bound 0.3934 0.1868 0.4062 0.4457

Mean up bound 0.9741 0.4624 0.9955 1.0021

Median up bound 0.9644 0.4546 0.9875 0.9856

SD up bound 0.4086 0.1975 0.4134 0.4638

Mean Width 1.5386 0.7297 1.5862 1.5511

Median Width 1.5298 0.7230 1.5727 1.5362

SD Width 0.0258 0.0184 0.1819 0.1769

Empirical power 0.0784 0.0775 0.0792 0.1157

% of Coverage 0.9513 0.9502 0.9485 0.9201

Mean low bound -0.2696 -0.1293 -0.2959 -0.2223

Median low bound -0.2711 -0.1315 -0.3043 -0.2232

SD low bound 0.3852 0.1817 0.4044 0.4351

Mean up bound 1.2879 0.6138 1.3038 1.3486

Median up bound 1.2703 0.6025 1.2901 1.3273


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !16 52

SD up bound 0.4223 0.2076 0.4248 0.4824

Mean Width 1.5574 0.7432 1.5997 1.5709

Median Width 1.5453 0.7339 1.5835 1.5525

SD Width 0.0421 0.0299 0.1938 0.1939

Empirical power 0.2343 0.2268 0.2264 0.2996

% of Coverage 0.9514 0.9485 0.9424 0.9130

Mean low bound 0.2110 0.0921 0.1937 0.3108

Median low bound 0.2014 0.0840 0.1796 0.2996

SD low bound 0.3857 0.1836 0.4196 0.4355

Mean up bound 1.8355 0.8825 1.8365 1.9497

Median up bound 1.8087 0.8654 1.8096 1.9134

SD up bound 0.4606 0.2334 0.4616 0.5288

Mean Width 1.6245 0.7904 1.6427 1.6388

Median Width 1.6073 0.7796 1.6178 1.6081

SD Width 0.0775 0.0538 0.2307 0.2431

Empirical power 0.7021 0.6816 0.6696 0.7627

% of Coverage 0.9495 0.9448 0.9275 0.8954

Mean low bound 1.1093 0.4971 1.1368 1.3060

Median low bound 1.0844 0.4783 1.1123 1.2802

SD low bound 0.4176 0.2117 0.4746 0.4729

Mean up bound 2.9791 1.4537 2.9444 3.1959

Median up bound 2.9318 1.4259 2.8948 3.1358

SD up bound 0.5684 0.3009 0.5725 0.6605

Mean Width 1.8697 0.9566 1.8076 1.8898

Median Width 1.8474 0.9421 1.7664 1.8365

SD Width 0.1518 0.1004 0.3224 0.3635

Empirical power 0.9987 0.9981 0.9962 0.9992


"δ = 2

"δ = 1


Page ! of !17 52

Table 4.8: NC and BS BCa CIs under Normal distributions (n1=20, n2=10, σ1=2, σ2=1)


% of Coverage 0.9808 0.9497 0.9563 0.9283

Mean low bound -0.7643 -0.3614 -0.6538 -0.6403

Median low bound -0.7591 -0.3578 -0.6447 -0.6273

SD low bound 0.3264 0.1881 0.3315 0.3709

Mean up bound 0.7660 0.3626 0.6555 0.6422

Median up bound 0.7591 0.3578 0.6472 0.6290

SD up bound 0.3264 0.1881 0.3311 0.3704

Mean Width 1.5303 0.7240 1.3093 1.2825

Median Width 1.5260 0.7193 1.2977 1.2687

SD Width 0.0141 0.0104 0.1102 0.1095

Empirical power 0.9813 0.9508 0.9563 0.9283

% of Coverage 0.9815 0.9517 0.9569 0.9291

Mean low bound -0.5609 -0.2444 -0.4514 -0.4141

Median low bound -0.5616 -0.2428 -0.4482 -0.4063

SD low bound 0.3204 0.1835 0.3240 0.3583

Mean up bound 0.9732 0.4824 0.8615 0.8736

Median up bound 0.9644 0.4765 0.8470 0.8549

SD up bound 0.3329 0.1927 0.3406 0.3841

Mean Width 1.5340 0.7268 1.3129 1.2877

Median Width 1.5260 0.7212 1.3010 1.2729

SD Width 0.0189 0.0139 0.1154 0.1173

Empirical power 0.0410 0.0915 0.0823 0.1250

% of Coverage 0.9798 0.9509 0.9516 0.9238

Mean low bound -0.2636 -0.0742 -0.1544 -0.0867

Median low bound -0.2711 -0.0767 -0.1636 -0.0885

SD low bound 0.3202 0.1813 0.3260 0.3514

Mean up bound 1.2901 0.6671 1.1776 1.2287

Median up bound 1.2742 0.6591 1.1555 1.2015


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !18 52

SD up bound 0.3517 0.2044 0.3654 0.4142

Mean Width 1.5536 0.7413 1.3320 1.3154

Median Width 1.5453 0.7339 1.3159 1.2931

SD Width 0.0346 0.0253 0.1398 0.1512

Empirical power 0.1948 0.3324 0.3042 0.3974

% of Coverage 0.9771 0.9516 0.9408 0.9134

Mean low bound 0.2165 0.1977 0.3312 0.4348

Median low bound 0.2014 0.1917 0.3125 0.4201

SD low bound 0.3312 0.1831 0.3452 0.3560

Mean up bound 1.8382 0.9890 1.7292 1.8474

Median up bound 1.8087 0.9749 1.6974 1.8075

SD up bound 0.3969 0.2295 0.4203 0.4791

Mean Width 1.6216 0.7913 1.3980 1.4126

Median Width 1.6073 0.7814 1.3737 1.3767

SD Width 0.0674 0.0480 0.2011 0.2292

Empirical power 0.7362 0.8639 0.8359 0.8984

% of Coverage 0.9669 0.9526 0.9210 0.8914

Mean low bound 1.1204 0.7047 1.2546 1.3994

Median low bound 1.0883 0.6901 1.2254 1.3673

SD low bound 0.3846 0.2062 0.4137 0.4121

Mean up bound 2.9920 1.6719 2.8967 3.1535

Median up bound 2.9396 1.6450 2.8436 3.0878

SD up bound 0.5254 0.2976 0.5589 0.6425

Mean Width 1.8716 0.9673 1.6421 1.7541

Median Width 1.8474 0.9549 1.5957 1.6927

SD Width 0.1415 0.0939 0.3295 0.3847

Empirical power 0.9999 1.0000 0.9999 1.0000


"δ = 2

"δ = 1


Page ! of !19 52

Table 4.9: NC and BS BCa CIs under Normal distributions (n1=75, n2=50, σ1=σ2=1)


% of Coverage 0.9493 0.9490 0.9507 0.9434

Mean low bound -0.3585 -0.1757 -0.3595 -0.3574

Median low bound -0.3578 -0.1753 -0.3587 -0.3563

SD low bound 0.1837 0.0901 0.1837 0.1886

Mean up bound 0.3598 0.1764 0.3609 0.3588

Median up bound 0.3597 0.1762 0.3603 0.3577

SD up bound 0.1837 0.0901 0.1838 0.1888

Mean Width 0.7183 0.3520 0.7204 0.7162

Median Width 0.7175 0.3515 0.7194 0.7156

SD Width 0.0020 0.0012 0.0342 0.0299

Empirical power 0.9504 0.9504 0.9507 0.9434

% of Coverage 0.9487 0.9492 0.9494 0.9421

Mean low bound -0.1588 -0.0779 -0.1613 -0.1534

Median low bound -0.1588 -0.0778 -0.1612 -0.1527

SD low bound 0.1833 0.0898 0.1833 0.1880

Mean up bound 0.5611 0.2751 0.5606 0.5644

Median up bound 0.5605 0.2746 0.5598 0.5631

SD up bound 0.1862 0.0915 0.1861 0.1914

Mean Width 0.7199 0.3530 0.7219 0.7178

Median Width 0.7193 0.3524 0.7208 0.7169

SD Width 0.0036 0.0021 0.0351 0.0310

Empirical power 0.1899 0.1898 0.1879 0.2064

% of Coverage 0.9503 0.9499 0.9501 0.9432

Mean low bound 0.1373 0.0669 0.1324 0.1487

Median low bound 0.1369 0.0662 0.1308 0.1474

SD low bound 0.1828 0.0896 0.1834 0.1875

Mean up bound 0.8655 0.4248 0.8627 0.8753

Median up bound 0.8618 0.4231 0.8593 0.8719


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !20 52

SD up bound 0.1905 0.0940 0.1905 0.1961

Mean Width 0.7283 0.3578 0.7303 0.7266

Median Width 0.7266 0.3569 0.7284 0.7247

SD Width 0.0081 0.0047 0.0396 0.0362

Empirical power 0.7704 0.7696 0.7647 0.7858

% of Coverage 0.9487 0.9488 0.9471 0.9394

Mean low bound 0.6260 0.3052 0.6194 0.6484

Median low bound 0.6226 0.3032 0.6161 0.6452

SD low bound 0.1873 0.0922 0.1895 0.1929

Mean up bound 1.3860 0.6812 1.3788 1.4064

Median up bound 1.3821 0.6789 1.3740 1.4016

SD up bound 0.2036 0.1013 0.2038 0.2099

Mean Width 0.7600 0.3760 0.7594 0.7580

Median Width 0.7577 0.3748 0.7560 0.7546

SD Width 0.0164 0.0094 0.0524 0.0502

Empirical power 0.9997 0.9998 0.9997 0.9998

% of Coverage 0.9513 0.9506 0.9443 0.9362

Mean low bound 1.5714 0.7645 1.5647 1.6150

Median low bound 1.5647 0.7603 1.5575 1.6072

SD low bound 0.2083 0.1046 0.2136 0.2154

Mean up bound 2.4469 1.2058 2.4299 2.4866

Median up bound 2.4374 1.2003 2.4199 2.4766

SD up bound 0.2406 0.1216 0.2418 0.2495

Mean Width 0.8755 0.4413 0.8652 0.8715

Median Width 0.8727 0.4401 0.8589 0.8651

SD Width 0.0324 0.0184 0.0829 0.0817

Empirical power 1.0000 1.0000 1.0000 1.0000


"δ = 2

"δ = 1


Page ! of !21 52

Table 4.10: NC and BS BCa CIs under Normal distributions (n1=75, n2=50, σ1=2, σ2=1)


% of Coverage 0.9722 0.9498 0.9520 0.9444

Mean low bound -0.3587 -0.1758 -0.3201 -0.3183

Median low bound -0.3578 -0.1753 -0.3191 -0.3169

SD low bound 0.1635 0.0903 0.1636 0.1684

Mean up bound 0.3593 0.1761 0.3208 0.3189

Median up bound 0.3597 0.1762 0.3203 0.3182

SD up bound 0.1635 0.0903 0.1636 0.1684

Mean Width 0.7180 0.3519 0.6409 0.6372

Median Width 0.7175 0.3515 0.6400 0.6365

SD Width 0.0017 0.0011 0.0281 0.0237

Empirical power 0.9729 0.9512 0.9520 0.9444

% of Coverage 0.9715 0.9490 0.9507 0.9434

Mean low bound -0.1588 -0.0654 -0.1218 -0.1137

Median low bound -0.1588 -0.0662 -0.1223 -0.1138

SD low bound 0.1623 0.0894 0.1615 0.1661

Mean up bound 0.5608 0.2876 0.5215 0.5261

Median up bound 0.5587 0.2862 0.5192 0.5231

SD up bound 0.1649 0.0912 0.1659 0.1711

Mean Width 0.7196 0.3530 0.6432 0.6399

Median Width 0.7193 0.3524 0.6421 0.6388

SD Width 0.0031 0.0021 0.0298 0.0259

Empirical power 0.1598 0.2281 0.2235 0.2464

% of Coverage 0.9711 0.9510 0.9511 0.9434

Mean low bound 0.1398 0.0989 0.1731 0.1901

Median low bound 0.1388 0.0984 0.1712 0.1882

SD low bound 0.1632 0.0894 0.1623 0.1660

Mean up bound 0.8679 0.4577 0.8280 0.8425

Median up bound 0.8636 0.4562 0.8243 0.8387


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !22 52

SD up bound 0.1702 0.0940 0.1722 0.1778

Mean Width 0.7281 0.3588 0.6548 0.6524

Median Width 0.7266 0.3578 0.6527 0.6501

SD Width 0.0073 0.0048 0.0371 0.0348

Empirical power 0.8028 0.8661 0.8594 0.8754

% of Coverage 0.9668 0.9492 0.9464 0.9376

Mean low bound 0.6267 0.3662 0.6536 0.6838

Median low bound 0.6226 0.3640 0.6486 0.6790

SD low bound 0.1715 0.0928 0.1715 0.1739

Mean up bound 1.3865 0.7459 1.3484 1.3792

Median up bound 1.3803 0.7433 1.3423 1.3728

SD up bound 0.1864 0.1023 0.1903 0.1964

Mean Width 0.7598 0.3798 0.6948 0.6955

Median Width 0.7577 0.3792 0.6903 0.6907

SD Width 0.0151 0.0097 0.0556 0.0547

Empirical power 1.0000 1.0000 1.0000 1.0000

% of Coverage 0.9557 0.9499 0.9404 0.9320

Mean low bound 1.5762 0.8856 1.5863 1.6387

Median low bound 1.5683 0.8819 1.5783 1.6303

SD low bound 0.2033 0.1073 0.2066 0.2066

Mean up bound 2.4524 1.3400 2.4240 2.4867

Median up bound 2.4410 1.3345 2.4136 2.4754

SD up bound 0.2349 0.1259 0.2409 0.2493

Mean Width 0.8762 0.4544 0.8376 0.8480

Median Width 0.8745 0.4535 0.8281 0.8388

SD Width 0.0317 0.0191 0.0955 0.0953

Empirical power 1.0000 1.0000 1.0000 1.0000


"δ = 2

"δ = 1


Page ! of !23 52

Table 4.11: NC and BS BCa CIs under Skew-normal distributions (assume n1=10, n2=10, σ1=1, σ2=1)


% of Coverage 0.9480 0.9497 0.9589 0.9168

Mean low bound -0.8678 -0.4347 -0.9279 -0.8583

Median low bound -0.8676 -0.4338 -0.9343 -0.8493

SD low bound 0.4672 0.2330 0.4452 0.5305

Mean up bound 0.9146 0.4580 1.0003 1.0209

Median up bound 0.8855 0.4427 0.9747 0.9760

SD up bound 0.4752 0.2377 0.5054 0.6192

Mean Width 1.7825 0.8927 1.9283 1.8792

Median Width 1.7665 0.8832 1.9091 1.8418

SD Width 0.0409 0.0232 0.1600 0.1923

Empirical power 0.9491 0.9508 0.9589 0.9168

% of Coverage 0.9446 0.9464 0.9560 0.9129

Mean low bound -0.6654 -0.3338 -0.7384 -0.6345

Median low bound -0.6753 -0.3376 -0.7586 -0.6371

SD low bound 0.4690 0.2332 0.4552 0.5304

Mean up bound 1.1246 0.5631 1.2200 1.2863

Median up bound 1.0867 0.5434 1.1831 1.2293

SD up bound 0.5024 0.2520 0.5310 0.6566

Mean Width 1.7900 0.8969 1.9584 1.9207

Median Width 1.7710 0.8855 1.9284 1.8653

SD Width 0.0533 0.0302 0.1832 0.2299

Empirical power 0.0821 0.0800 0.0616 0.1199

% of Coverage 0.9374 0.9401 0.9480 0.9031

Mean low bound -0.3686 -0.1864 -0.4526 -0.3042

Median low bound -0.3846 -0.1923 -0.4889 -0.3237

SD low bound 0.4747 0.2351 0.4802 0.5366

Mean up bound 1.4514 0.7271 1.5566 1.6929

Median up bound 1.4043 0.7021 1.5045 1.6209


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !24 52

SD up bound 0.5486 0.2762 0.5763 0.7175

Mean Width 1.8200 0.9135 2.0092 1.9970

Median Width 1.7889 0.8967 1.9635 1.9146

SD Width 0.0856 0.0482 0.2330 0.3020

Empirical power 0.2064 0.2025 0.1605 0.2686

% of Coverage 0.9298 0.9334 0.9330 0.8862

Mean low bound 0.1032 0.0470 0.0217 0.2239

Median low bound 0.0760 0.0358 -0.0338 0.1850

SD low bound 0.4883 0.2406 0.5340 0.5588

Mean up bound 2.0195 1.0133 2.1367 2.3939

Median up bound 1.9543 0.9794 2.0645 2.2961

SD up bound 0.6306 0.3189 0.6604 0.8264

Mean Width 1.9164 0.9663 2.1150 2.1700

Median Width 1.8738 0.9436 2.0492 2.0629

SD Width 0.1488 0.0830 0.3355 0.4402

Empirical power 0.5625 0.5572 0.4739 0.6409

% of Coverage 0.9184 0.9251 0.9019 0.8534

Mean low bound 0.9797 0.4781 0.9698 1.2160

Median low bound 0.9302 0.4562 0.9022 1.1516

SD low bound 0.5500 0.2695 0.6597 0.6488

Mean up bound 3.2353 1.6284 3.3539 3.8576

Median up bound 3.1305 1.5742 3.2429 3.7112

SD up bound 0.8365 0.4247 0.8690 1.0911

Mean Width 2.2556 1.1503 2.3841 2.6416

Median Width 2.2003 1.1180 2.2889 2.4952

SD Width 0.2891 0.1596 0.5421 0.7112

Empirical power 0.9798 0.9788 0.9531 0.9885


"δ = 2

"δ = 1


Page ! of !25 52



% of Coverage 0.9442 0.9489 0.9568 0.9117

Mean low bound -0.8870 -0.4458 -0.9752 -0.9352

Median low bound -0.8810 -0.4405 -0.9732 -0.9116

SD low bound 0.4790 0.2379 0.4687 0.5848

Mean up bound 0.8963 0.4503 0.9986 0.9921

Median up bound 0.8765 0.4383 0.9837 0.9472

SD up bound 0.4818 0.2410 0.5006 0.6334

Mean Width 1.7834 0.8961 1.9739 1.9273

Median Width 1.7665 0.8855 1.9523 1.8863

SD Width 0.0430 0.0278 0.1842 0.2209

Empirical power 0.9454 0.9502 0.9568 0.9117

% of Coverage 0.9444 0.9494 0.9571 0.9113

Mean low bound -0.6780 -0.3423 -0.7737 -0.6881

Median low bound -0.6798 -0.3399 -0.7869 -0.6780

SD low bound 0.4685 0.2309 0.4674 0.5687

Mean up bound 1.1110 0.5578 1.2126 1.2592

Median up bound 1.0823 0.5411 1.1769 1.1985

SD up bound 0.4976 0.2508 0.5162 0.6635

Mean Width 1.7890 0.9001 1.9863 1.9473

Median Width 1.7710 0.8877 1.9584 1.8948

SD Width 0.0513 0.0340 0.2015 0.2505

Empirical power 0.0773 0.0710 0.0587 0.1166

% of Coverage 0.9398 0.9460 0.9497 0.9016

Mean low bound -0.3771 -0.1949 -0.4754 -0.3299

Median low bound -0.3846 -0.1968 -0.5105 -0.3355

SD low bound 0.4680 0.2280 0.4867 0.5611

Mean up bound 1.4408 0.7246 1.5407 1.6732

Median up bound 1.3998 0.7021 1.4796 1.5923


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !26 52

SD up bound 0.5376 0.2737 0.5651 0.7318

Mean Width 1.8180 0.9195 2.0160 2.0031

Median Width 1.7889 0.9011 1.9736 1.9199

SD Width 0.0821 0.0538 0.2570 0.3316

Empirical power 0.1969 0.1833 0.1526 0.2644

% of Coverage 0.9369 0.9453 0.9358 0.8856

Mean low bound 0.0976 0.0344 0.0199 0.2305

Median low bound 0.0716 0.0224 -0.0365 0.1982

SD low bound 0.4753 0.2284 0.5374 0.5627

Mean up bound 2.0108 1.0165 2.1090 2.3837

Median up bound 1.9454 0.9816 2.0241 2.2712

SD up bound 0.6138 0.3162 0.6595 0.8494

Mean Width 1.9132 0.9820 2.0892 2.1532

Median Width 1.8738 0.9593 2.0236 2.0326

SD Width 0.1448 0.0930 0.3697 0.4847

Empirical power 0.5595 0.5383 0.4723 0.6494

% of Coverage 0.9271 0.9409 0.8983 0.8522

Mean low bound 0.9819 0.4535 1.0059 1.2534

Median low bound 0.9302 0.4271 0.9482 1.1927

SD low bound 0.5362 0.2573 0.6451 0.6275

Mean up bound 3.2368 1.6526 3.3388 3.9029

Median up bound 3.1260 1.5966 3.2111 3.7354

SD up bound 0.8178 0.4223 0.8947 1.1412

Mean Width 2.2549 1.1991 2.3329 2.6495

Median Width 2.1958 1.1672 2.2305 2.4869

SD Width 0.2840 0.1728 0.5991 0.7876

Empirical power 0.9846 0.9816 0.9619 0.9923


"δ = 2

"δ = 1


Page ! of !27 52



% of Coverage 0.9491 0.9458 0.9491 0.9197

Mean low bound -0.7537 -0.3503 -0.7980 -0.7380

Median low bound -0.7514 -0.3542 -0.8063 -0.7343

SD low bound 0.3981 0.1922 0.4183 0.4556

Mean up bound 0.7812 0.3767 0.7878 0.8109

Median up bound 0.7707 0.3633 0.7767 0.7955

SD up bound 0.3998 0.1938 0.3977 0.4511

Mean Width 1.5349 0.7270 1.5857 1.5489

Median Width 1.5260 0.7212 1.5725 1.5349

SD Width 0.0207 0.0148 0.1836 0.1776

Empirical power 0.9502 0.9470 0.9491 0.9197

% of Coverage 0.9460 0.9391 0.9465 0.9168

Mean low bound -0.5528 -0.2547 -0.6060 -0.5244

Median low bound -0.5538 -0.2611 -0.6161 -0.5260

SD low bound 0.3983 0.1945 0.4255 0.4556

Mean up bound 0.9867 0.4756 1.0007 1.0481

Median up bound 0.9682 0.4583 0.9819 1.0262

SD up bound 0.4154 0.2069 0.4136 0.4709

Mean Width 1.5395 0.7302 1.6066 1.5724

Median Width 1.5298 0.7230 1.5908 1.5543

SD Width 0.0272 0.0192 0.1985 0.1946

Empirical power 0.0852 0.0983 0.0844 0.1304

% of Coverage 0.9407 0.9267 0.9424 0.9116

Mean low bound -0.2557 -0.1150 -0.3196 -0.2096

Median low bound -0.2595 -0.1260 -0.3352 -0.2161

SD low bound 0.4031 0.2004 0.4418 0.4604

Mean up bound 1.3043 0.6298 1.3306 1.4168

Median up bound 1.2820 0.6080 1.3071 1.3875


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !28 52

SD up bound 0.4439 0.2291 0.4439 0.5079

Mean Width 1.5601 0.7448 1.6502 1.6263

Median Width 1.5453 0.7339 1.6275 1.6001

SD Width 0.0464 0.0325 0.2268 0.2290

Empirical power 0.2538 0.2625 0.2279 0.3163

% of Coverage 0.9330 0.9094 0.9317 0.9003

Mean low bound 0.2208 0.1052 0.1495 0.2976

Median low bound 0.2091 0.0895 0.1297 0.2824

SD low bound 0.4176 0.2154 0.4765 0.4776

Mean up bound 1.8496 0.8981 1.8910 2.0447

Median up bound 1.8164 0.8709 1.8552 2.0012

SD up bound 0.5001 0.2704 0.5028 0.5774

Mean Width 1.6288 0.7929 1.7415 1.7471

Median Width 1.6073 0.7796 1.7075 1.7060

SD Width 0.0857 0.0585 0.2815 0.2973

Empirical power 0.6958 0.6693 0.6111 0.7323

% of Coverage 0.9223 0.8907 0.9165 0.8828

Mean low bound 1.1208 0.5126 1.0740 1.2659

Median low bound 1.0922 0.4875 1.0516 1.2367

SD low bound 0.4665 0.2596 0.5538 0.5371

Mean up bound 2.9978 1.4732 3.0502 3.3419

Median up bound 2.9435 1.4350 2.9914 3.2694

SD up bound 0.6361 0.3617 0.6399 0.7391

Mean Width 1.8770 0.9605 1.9762 2.0759

Median Width 1.8513 0.9439 1.9194 2.0063

SD Width 0.1708 0.1112 0.3906 0.4398

Empirical power 0.9966 0.9921 0.9808 0.9962


"δ = 2

"δ = 1


Page ! of !29 52



% of Coverage 0.9816 0.9502 0.9570 0.9290

Mean low bound -0.7620 -0.3573 -0.6605 -0.6329

Median low bound -0.7591 -0.3578 -0.6530 -0.6212

SD low bound 0.3259 0.1878 0.3361 0.3727

Mean up bound 0.7682 0.3668 0.6482 0.6482

Median up bound 0.7630 0.3597 0.6391 0.6340

SD up bound 0.3262 0.1888 0.3265 0.3680

Mean Width 1.5303 0.7241 1.3087 1.2811

Median Width 1.5260 0.7193 1.2970 1.2675

SD Width 0.0142 0.0107 0.1119 0.1108

Empirical power 0.9821 0.9513 0.9570 0.9290

% of Coverage 0.9796 0.9467 0.9552 0.9264

Mean low bound -0.5580 -0.2397 -0.4608 -0.4100

Median low bound -0.5577 -0.2410 -0.4600 -0.4039

SD low bound 0.3239 0.1874 0.3348 0.3666

Mean up bound 0.9764 0.4877 0.8589 0.8846

Median up bound 0.9644 0.4783 0.8421 0.8640

SD up bound 0.3370 0.1980 0.3409 0.3875

Mean Width 1.5343 0.7274 1.3198 1.2946

Median Width 1.5260 0.7212 1.3060 1.2780

SD Width 0.0195 0.0153 0.1228 0.1254

Empirical power 0.0431 0.1012 0.0836 0.1324

% of Coverage 0.9779 0.9429 0.9509 0.9223

Mean low bound -0.2612 -0.0699 -0.1675 -0.0873

Median low bound -0.2672 -0.0749 -0.1774 -0.0920

SD low bound 0.3247 0.1884 0.3384 0.3609

Mean up bound 1.2930 0.6727 1.1811 1.2459

Median up bound 1.2742 0.6591 1.1557 1.2147


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !30 52

SD up bound 0.3572 0.2137 0.3691 0.4216

Mean Width 1.5541 0.7425 1.3486 1.3332

Median Width 1.5453 0.7339 1.3307 1.3089

SD Width 0.0357 0.0278 0.1517 0.1642

Empirical power 0.1999 0.3405 0.2975 0.3966

% of Coverage 0.9729 0.9360 0.9385 0.9097

Mean low bound 0.2175 0.2014 0.3136 0.4268

Median low bound 0.2014 0.1935 0.2943 0.4128

SD low bound 0.3395 0.1964 0.3615 0.3704

Mean up bound 1.8399 0.9950 1.7393 1.8702

Median up bound 1.8087 0.9768 1.7022 1.8239

SD up bound 0.4069 0.2472 0.4309 0.4942

Mean Width 1.6224 0.7936 1.4257 1.4434

Median Width 1.6073 0.7832 1.4001 1.4055

SD Width 0.0693 0.0525 0.2133 0.2433

Empirical power 0.7318 0.8501 0.8093 0.8833

% of Coverage 0.9616 0.9324 0.9211 0.8916

Mean low bound 1.1246 0.7097 1.2363 1.3859

Median low bound 1.0922 0.6938 1.2052 1.3524

SD low bound 0.3974 0.2271 0.4287 0.4266

Mean up bound 2.9983 1.6816 2.9295 3.1979

Median up bound 2.9435 1.6505 2.8696 3.1232

SD up bound 0.5435 0.3279 0.5841 0.6757

Mean Width 1.8737 0.9719 1.6931 1.8120

Median Width 1.8513 0.9567 1.6454 1.7477

SD Width 0.1469 0.1033 0.3438 0.4057

Empirical power 0.9998 0.9999 0.9998 1.0000


"δ = 2

"δ = 1


Page ! of !31 52



% of Coverage 0.9496 0.9491 0.9506 0.9433

Mean low bound -0.3556 -0.1735 -0.3584 -0.3473

Median low bound -0.3560 -0.1744 -0.3596 -0.3485

SD low bound 0.1840 0.0904 0.1813 0.1861

Mean up bound 0.3627 0.1785 0.3621 0.3693

Median up bound 0.3597 0.1762 0.3598 0.3664

SD up bound 0.1841 0.0904 0.1865 0.1922

Mean Width 0.7183 0.3520 0.7205 0.7167

Median Width 0.7175 0.3515 0.7190 0.7153

SD Width 0.0021 0.0012 0.0353 0.0311

Empirical power 0.9507 0.9503 0.9506 0.9433

% of Coverage 0.9466 0.9440 0.9504 0.9432

Mean low bound -0.1560 -0.0756 -0.1665 -0.1497

Median low bound -0.1570 -0.0778 -0.1689 -0.1515

SD low bound 0.1859 0.0921 0.1835 0.1879

Mean up bound 0.5640 0.2773 0.5683 0.5815

Median up bound 0.5605 0.2746 0.5652 0.5778

SD up bound 0.1890 0.0938 0.1914 0.1973

Mean Width 0.7200 0.3530 0.7347 0.7312

Median Width 0.7193 0.3524 0.7326 0.7290

SD Width 0.0038 0.0021 0.0393 0.0358

Empirical power 0.1977 0.2012 0.1813 0.2114

% of Coverage 0.9392 0.9330 0.9488 0.9407

Mean low bound 0.1413 0.0699 0.1195 0.1448

Median low bound 0.1388 0.0680 0.1161 0.1419

SD low bound 0.1914 0.0961 0.1901 0.1939

Mean up bound 0.8700 0.4279 0.8809 0.9039

Median up bound 0.8654 0.4249 0.8765 0.8987


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !32 52

SD up bound 0.1996 0.1008 0.2020 0.2084

Mean Width 0.7286 0.3580 0.7614 0.7591

Median Width 0.7266 0.3569 0.7578 0.7555

SD Width 0.0086 0.0049 0.0483 0.0456

Empirical power 0.7656 0.7609 0.7309 0.7715

% of Coverage 0.9299 0.9188 0.9457 0.9373

Mean low bound 0.6283 0.3075 0.5903 0.6287

Median low bound 0.6244 0.3041 0.5852 0.6240

SD low bound 0.2027 0.1041 0.2050 0.2072

Mean up bound 1.3887 0.6836 1.4095 1.4486

Median up bound 1.3821 0.6798 1.4034 1.4415

SD up bound 0.2204 0.1138 0.2227 0.2297

Mean Width 0.7604 0.3762 0.8193 0.8199

Median Width 0.7577 0.3748 0.8132 0.8139

SD Width 0.0179 0.0100 0.0667 0.0650

Empirical power 0.9994 0.9993 0.9990 0.9993

% of Coverage 0.9177 0.8998 0.9402 0.9319

Mean low bound 1.5762 0.7683 1.5141 1.5762

Median low bound 1.5683 0.7629 1.5067 1.5684

SD low bound 0.2363 0.1260 0.2469 0.2455

Mean up bound 2.4527 1.2101 2.4870 2.5579

Median up bound 2.4428 1.2039 2.4758 2.5463

SD up bound 0.2730 0.1450 0.2742 0.2832

Mean Width 0.8766 0.4418 0.9728 0.9817

Median Width 0.8745 0.4401 0.9615 0.9707

SD Width 0.0368 0.0201 0.1069 0.1057

Empirical power 1.0000 1.0000 1.0000 1.0000


"δ = 2

"δ = 1


Page ! of !33 52



% of Coverage 0.9718 0.9501 0.9520 0.9442

Mean low bound -0.3586 -0.1754 -0.3211 -0.3167

Median low bound -0.3597 -0.1762 -0.3207 -0.3164

SD low bound 0.1632 0.0901 0.1631 0.1678

Mean up bound 0.3594 0.1765 0.3197 0.3204

Median up bound 0.3578 0.1753 0.3184 0.3188

SD up bound 0.1632 0.0902 0.1634 0.1684

Mean Width 0.7180 0.3519 0.6408 0.6371

Median Width 0.7175 0.3515 0.6398 0.6364

SD Width 0.0017 0.0011 0.0284 0.0239

Empirical power 0.9724 0.9511 0.9520 0.9442

% of Coverage 0.9708 0.9479 0.9517 0.9444

Mean low bound -0.1581 -0.0647 -0.1239 -0.1132

Median low bound -0.1588 -0.0653 -0.1243 -0.1140

SD low bound 0.1634 0.0905 0.1626 0.1671

Mean up bound 0.5615 0.2883 0.5226 0.5298

Median up bound 0.5587 0.2871 0.5199 0.5267

SD up bound 0.1661 0.0923 0.1671 0.1724

Mean Width 0.7196 0.3531 0.6465 0.6430

Median Width 0.7193 0.3524 0.6451 0.6418

SD Width 0.0031 0.0022 0.0310 0.0270

Empirical power 0.1634 0.2338 0.2220 0.2492

% of Coverage 0.9685 0.9450 0.9507 0.9427

Mean low bound 0.1399 0.0993 0.1682 0.1877

Median low bound 0.1388 0.0984 0.1663 0.1858

SD low bound 0.1657 0.0917 0.1645 0.1684

Mean up bound 0.8680 0.4582 0.8314 0.8484

Median up bound 0.8636 0.4562 0.8274 0.8438


"δ = 0

BS BCa (" )ds

"δ = 0.5

"δ = 0.2


Page ! of !34 52

SD up bound 0.1728 0.0967 0.1753 0.1811

Mean Width 0.7281 0.3589 0.6631 0.6608

Median Width 0.7266 0.3578 0.6607 0.6580

SD Width 0.0074 0.0051 0.0392 0.0368

Empirical power 0.7976 0.8598 0.8486 0.8692

% of Coverage 0.9637 0.9410 0.9478 0.9395

Mean low bound 0.6273 0.3669 0.6462 0.6782

Median low bound 0.6226 0.3649 0.6416 0.6742

SD low bound 0.1747 0.0964 0.1740 0.1768

Mean up bound 1.3872 0.7469 1.3570 1.3900

Median up bound 1.3821 0.7442 1.3508 1.3832

SD up bound 0.1900 0.1067 0.1950 0.2016

Mean Width 0.7599 0.3801 0.7107 0.7118

Median Width 0.7577 0.3792 0.7060 0.7070

SD Width 0.0154 0.0105 0.0580 0.0572

Empirical power 0.9999 1.0000 1.0000 1.0000

% of Coverage 0.9514 0.9363 0.9420 0.9348

Mean low bound 1.5763 0.8860 1.5750 1.6278

Median low bound 1.5683 0.8819 1.5668 1.6192

SD low bound 0.2088 0.1136 0.2104 0.2110

Mean up bound 2.4526 1.3410 2.4393 2.5034

Median up bound 2.4410 1.3354 2.4274 2.4912

SD up bound 0.2413 0.1343 0.2495 0.2583

Mean Width 0.8762 0.4549 0.8643 0.8756

Median Width 0.8745 0.4535 0.8547 0.8662

SD Width 0.0326 0.0213 0.0972 0.0977

Empirical power 1.0000 1.0000 1.0000 1.0000


"δ = 2

"δ = 1


Page ! of !35 52

Table 4.17: Type I error rates and empirical powers in Student’s t-test and Welch’s t-test for two normally distributed groups

Power Type I error rate

Student t-test

Welch t-test

Student t-test

Welch t-test

10 10 1.0 1 0.1 0 0.0555 0.0539 0.0500 0.0486

10 10 1.0 1 0.5 0 0.1850 0.1814 0.0503 0.0489

10 10 1.0 1 0.8 0 0.3961 0.3907 0.0502 0.0487

10 10 1.0 1 1.5 0 0.8873 0.8843 0.0496 0.0482

10 10 1.0 1 2.0 0 0.9882 0.9877 0.0499 0.0485

10 10 1.0 1 5.0 0 1.0000 1.0000 0.0500 0.0486

10 10 0.5 1 0.1 0 0.0630 0.0580 0.0546 0.0500

10 10 0.5 1 0.5 0 0.2746 0.2591 0.0549 0.0503

10 10 0.5 1 0.8 0 0.5747 0.5530 0.0545 0.0501

10 10 0.5 1 1.5 0 0.9782 0.9738 0.0546 0.0500

10 10 0.5 1 2.0 0 0.9995 0.9993 0.0546 0.0500

10 10 0.5 1 5.0 0 1.0000 1.0000 0.0546 0.0500

10 10 1.5 1 0.1 0 0.0549 0.0522 0.0516 0.0490

10 10 1.5 1 0.5 0 0.1348 0.1293 0.0519 0.0493

10 10 1.5 1 0.8 0 0.2679 0.2590 0.0518 0.0492

10 10 1.5 1 1.5 0 0.7013 0.6899 0.0516 0.0491

10 10 1.5 1 2.0 0 0.9116 0.9055 0.0520 0.0494

10 10 1.5 1 5.0 0 1.0000 1.0000 0.0526 0.0499

15 10 1.0 1 0.1 0 0.0563 0.0555 0.0497 0.0492

15 10 1.0 1 0.5 0 0.2173 0.2128 0.0501 0.0498

15 10 1.0 1 0.8 0 0.4667 0.4581 0.0500 0.0494

15 10 1.0 1 1.5 0 0.9404 0.9351 0.0500 0.0497

15 10 1.0 1 2.0 0 0.9968 0.9961 0.0502 0.0497

15 10 1.0 1 5.0 0 1.0000 1.0000 0.0497 0.0492

15 10 0.5 1 0.1 0 0.0990 0.0595 0.0870 0.0508

15 10 0.5 1 0.5 0 0.3779 0.2730 0.0862 0.0510

15 10 0.5 1 0.8 0 0.6998 0.5771 0.0866 0.0507

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !36 52

15 10 0.5 1 1.5 0 0.9930 0.9792 0.0867 0.0511

15 10 0.5 1 2.0 0 0.9999 0.9996 0.0867 0.0511

15 10 0.5 1 5.0 0 1.0000 1.0000 0.0866 0.0508

15 10 1.5 1 0.1 0 0.0395 0.0533 0.0359 0.0490

15 10 1.5 1 0.5 0 0.1267 0.1581 0.0364 0.0496

15 10 1.5 1 0.8 0 0.2813 0.3320 0.0359 0.0489

15 10 1.5 1 1.5 0 0.7710 0.8165 0.0358 0.0488

15 10 1.5 1 2.0 0 0.9537 0.9682 0.0360 0.0492

15 10 1.5 1 5.0 0 1.0000 1.0000 0.0360 0.0492

20 10 1.0 1 0.1 0 0.0573 0.0573 0.0498 0.0498

20 10 1.0 1 0.5 0 0.2384 0.2314 0.0501 0.0502

20 10 1.0 1 0.8 0 0.5144 0.4976 0.0502 0.0503

20 10 1.0 1 1.5 0 0.9624 0.9543 0.0501 0.0502

20 10 1.0 1 2.0 0 0.9988 0.9980 0.0501 0.0502

20 10 1.0 1 5.0 0 1.0000 1.0000 0.0502 0.0506

20 10 0.5 1 0.1 0 0.1291 0.0599 0.1135 0.0511

20 10 0.5 1 0.5 0 0.4457 0.2790 0.1138 0.0517

20 10 0.5 1 0.8 0 0.7651 0.5870 0.1139 0.0512

20 10 0.5 1 1.5 0 0.9968 0.9815 0.1132 0.0510

20 10 0.5 1 2.0 0 1.0000 0.9997 0.1137 0.0510

20 10 0.5 1 5.0 0 1.0000 1.0000 0.1138 0.0514

20 10 1.5 1 0.1 0 0.0306 0.0544 0.0271 0.0493

20 10 1.5 1 0.5 0 0.1206 0.1792 0.0274 0.0492

20 10 1.5 1 0.8 0 0.2906 0.3838 0.0273 0.0496

20 10 1.5 1 1.5 0 0.8132 0.8764 0.0273 0.0493

20 10 1.5 1 2.0 0 0.9726 0.9860 0.0271 0.0491

20 10 1.5 1 5.0 0 1.0000 1.0000 0.0277 0.0499

20 20 1.0 1 0.1 0 0.0612 0.0608 0.0502 0.0498


Student t-test

Welch t-test

Student t-test

Welch t-test

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !37 52

20 20 1.0 1 0.5 0 0.3387 0.3376 0.0498 0.0495

20 20 1.0 1 0.8 0 0.6936 0.6925 0.0502 0.0499

20 20 1.0 1 1.5 0 0.9963 0.9962 0.0499 0.0495

20 20 1.0 1 2.0 0 1.0000 1.0000 0.0499 0.0496

20 20 1.0 1 5.0 0 1.0000 1.0000 0.0498 0.0494

20 20 0.5 1 0.1 0 0.0699 0.0669 0.0526 0.0502

20 20 0.5 1 0.5 0 0.4978 0.4885 0.0526 0.0502

20 20 0.5 1 0.8 0 0.8758 0.8703 0.0522 0.0499

20 20 0.5 1 1.5 0 0.9999 0.9999 0.0520 0.0497

20 20 0.5 1 2.0 0 1.0000 1.0000 0.0525 0.0500

20 20 0.5 1 5.0 0 1.0000 1.0000 0.0526 0.0502

20 20 1.5 1 0.1 0 0.0577 0.0564 0.0509 0.0498

20 20 1.5 1 0.5 0 0.2286 0.2254 0.0508 0.0496

20 20 1.5 1 0.8 0 0.4916 0.4869 0.0508 0.0496

20 20 1.5 1 1.5 0 0.9514 0.9500 0.0510 0.0499

20 20 1.5 1 2.0 0 0.9979 0.9978 0.0508 0.0497

20 20 1.5 1 5.0 0 1.0000 1.0000 0.0508 0.0496

30 20 1.0 1 0.1 0 0.0631 0.0629 0.0503 0.0502

30 20 1.0 1 0.5 0 0.3958 0.3929 0.0503 0.0502

30 20 1.0 1 0.8 0 0.7754 0.7717 0.0499 0.0499

30 20 1.0 1 1.5 0 0.9991 0.9990 0.0498 0.0496

30 20 1.0 1 2.0 0 1.0000 1.0000 0.0497 0.0497

30 20 1.0 1 5.0 0 1.0000 1.0000 0.0500 0.0499

30 20 0.5 1 0.1 0 0.1100 0.0685 0.0844 0.0502

30 20 0.5 1 0.5 0 0.6179 0.5130 0.0840 0.0499

30 20 0.5 1 0.8 0 0.9348 0.8891 0.0846 0.0502

30 20 0.5 1 1.5 0 1.0000 1.0000 0.0841 0.0498

30 20 0.5 1 2.0 0 1.0000 1.0000 0.0847 0.0502


Student t-test

Welch t-test

Student t-test

Welch t-test

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !38 52

30 20 0.5 1 5.0 0 1.0000 1.0000 0.0846 0.0499

30 20 1.5 1 0.1 0 0.0425 0.0588 0.0351 0.0494

30 20 1.5 1 0.5 0 0.2336 0.2830 0.0347 0.0490

30 20 1.5 1 0.8 0 0.5394 0.6011 0.0352 0.0498

30 20 1.5 1 1.5 0 0.9788 0.9860 0.0350 0.0494

30 20 1.5 1 2.0 0 0.9997 0.9998 0.0350 0.0497

30 20 1.5 1 5.0 0 1.0000 1.0000 0.0348 0.0493

40 20 1.0 1 0.1 0 0.0651 0.0650 0.0496 0.0496

40 20 1.0 1 0.5 0 0.4344 0.4280 0.0501 0.0502

40 20 1.0 1 0.8 0 0.8185 0.8115 0.0498 0.0501

40 20 1.0 1 1.5 0 0.9997 0.9996 0.0502 0.0500

40 20 1.0 1 2.0 0 1.0000 1.0000 0.0507 0.0507

40 20 1.0 1 5.0 0 1.0000 1.0000 0.0499 0.0502

40 20 0.5 1 0.1 0 0.1425 0.0697 0.1116 0.0506

40 20 0.5 1 0.5 0 0.6859 0.5260 0.1118 0.0502

40 20 0.5 1 0.8 0 0.9579 0.8984 0.1122 0.0505

40 20 0.5 1 1.5 0 1.0000 1.0000 0.1120 0.0501

40 20 0.5 1 2.0 0 1.0000 1.0000 0.1114 0.0503

40 20 0.5 1 5.0 0 1.0000 1.0000 0.1113 0.0501

40 20 1.5 1 0.1 0 0.0333 0.0601 0.0265 0.0497

40 20 1.5 1 0.5 0 0.2353 0.3240 0.0267 0.0500

40 20 1.5 1 0.8 0 0.5711 0.6727 0.0264 0.0495

40 20 1.5 1 1.5 0 0.9885 0.9946 0.0266 0.0498

40 20 1.5 1 2.0 0 0.9999 1.0000 0.0269 0.0503

40 20 1.5 1 5.0 0 1.0000 1.0000 0.0263 0.0498

50 50 1.0 1 0.1 0 0.0784 0.0783 0.0502 0.0501

50 50 1.0 1 0.5 0 0.6974 0.6973 0.0500 0.0500

50 50 1.0 1 0.8 0 0.9771 0.9771 0.0499 0.0499


Student t-test

Welch t-test

Student t-test

Welch t-test

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !39 52

50 50 1.0 1 1.5 0 1.0000 1.0000 0.0499 0.0498

50 50 1.0 1 2.0 0 1.0000 1.0000 0.0501 0.0500

50 50 1.0 1 5.0 0 1.0000 1.0000 0.0504 0.0503

50 50 0.5 1 0.1 0 0.0974 0.0958 0.0512 0.0502

50 50 0.5 1 0.5 0 0.8791 0.8772 0.0507 0.0498

50 50 0.5 1 0.8 0 0.9988 0.9988 0.0509 0.0500

50 50 0.5 1 1.5 0 1.0000 1.0000 0.0512 0.0501

50 50 0.5 1 2.0 0 1.0000 1.0000 0.0511 0.0501

50 50 0.5 1 5.0 0 1.0000 1.0000 0.0516 0.0506

50 50 1.5 1 0.1 0 0.0679 0.0674 0.0502 0.0497

50 50 1.5 1 0.5 0 0.4926 0.4910 0.0504 0.0500

50 50 1.5 1 0.8 0 0.8735 0.8727 0.0503 0.0499

50 50 1.5 1 1.5 0 0.9999 0.9999 0.0505 0.0500

50 50 1.5 1 2.0 0 1.0000 1.0000 0.0504 0.0500

50 50 1.5 1 5.0 0 1.0000 1.0000 0.0508 0.0503

75 50 1.0 1 0.1 0 0.0849 0.0848 0.0501 0.0502

75 50 1.0 1 0.5 0 0.7751 0.7741 0.0502 0.0501

75 50 1.0 1 0.8 0 0.9915 0.9913 0.0499 0.0499

75 50 1.0 1 1.5 0 1.0000 1.0000 0.0498 0.0498

75 50 1.0 1 2.0 0 1.0000 1.0000 0.0498 0.0498

75 50 1.0 1 5.0 0 1.0000 1.0000 0.0498 0.0498

75 50 0.5 1 0.1 0 0.1477 0.0987 0.0828 0.0497

75 50 0.5 1 0.5 0 0.9350 0.8972 0.0836 0.0500

75 50 0.5 1 0.8 0 0.9997 0.9993 0.0832 0.0500

75 50 0.5 1 1.5 0 1.0000 1.0000 0.0834 0.0499

75 50 0.5 1 2.0 0 1.0000 1.0000 0.0838 0.0500

75 50 0.5 1 5.0 0 1.0000 1.0000 0.0833 0.0504

75 50 1.5 1 0.1 0 0.0527 0.0727 0.0347 0.0502


Student t-test

Welch t-test

Student t-test

Welch t-test

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !40 52

75 50 1.5 1 0.5 0 0.5404 0.6019 0.0347 0.0501

75 50 1.5 1 0.8 0 0.9234 0.9440 0.0348 0.0500

75 50 1.5 1 1.5 0 1.0000 1.0000 0.0346 0.0501

75 50 1.5 1 2.0 0 1.0000 1.0000 0.0347 0.0501

75 50 1.5 1 5.0 0 1.0000 1.0000 0.0348 0.0500

100 50 1.0 1 0.1 0 0.0883 0.0880 0.0499 0.0500

100 50 1.0 1 0.5 0 0.8175 0.8149 0.0495 0.0495

100 50 1.0 1 0.8 0 0.9956 0.9954 0.0499 0.0500

100 50 1.0 1 1.5 0 1.0000 1.0000 0.0500 0.0499

100 50 1.0 1 2.0 0 1.0000 1.0000 0.0498 0.0500

100 50 1.0 1 5.0 0 1.0000 1.0000 0.0500 0.0500

100 50 0.5 1 0.1 0 0.1869 0.1003 0.1105 0.0499

100 50 0.5 1 0.5 0 0.9568 0.9065 0.1106 0.0503

100 50 0.5 1 0.8 0 0.9999 0.9995 0.1105 0.0499

100 50 0.5 1 1.5 0 1.0000 1.0000 0.1105 0.0503

100 50 0.5 1 2.0 0 1.0000 1.0000 0.1105 0.0499

100 50 0.5 1 5.0 0 1.0000 1.0000 0.1102 0.0502

100 50 1.5 1 0.1 0 0.0436 0.0768 0.0257 0.0499

100 50 1.5 1 0.5 0 0.5704 0.6730 0.0262 0.0500

100 50 1.5 1 0.8 0 0.9473 0.9706 0.0261 0.0499

100 50 1.5 1 1.5 0 1.0000 1.0000 0.0260 0.0500

100 50 1.5 1 2.0 0 1.0000 1.0000 0.0262 0.0501

100 50 1.5 1 5.0 0 1.0000 1.0000 0.0262 0.0502


Student t-test

Welch t-test

Student t-test

Welch t-test

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !41 52

Table 4.18: Type I error rates and empirical powers in Student’s t-test, Welch’s t-test and Mann-Whitney U test for two independent groups following Skew-normal distribution

Power Type I error

Student t-test

Welch t-test

Mann-Whitney

U testStudent

t-testWelch t-test

Mann-Whitney

U test

10 10 1 1 -1 0 0.5637 0.5584 0.5403 0.0519 0.0502 0.0469

15 10 1 1 -1 0 0.6435 0.6360 0.6468 0.0518 0.0498 0.0519

20 10 1 1 -1 0 0.6930 0.6798 0.7067 0.0515 0.0502 0.0535

20 20 1 1 -1 0 0.8552 0.8545 0.8711 0.0505 0.0500 0.0554

30 20 1 1 -1 0 0.9125 0.9143 0.9273 0.0507 0.0499 0.0555

40 20 1 1 -1 0 0.9397 0.9404 0.9530 0.0505 0.0499 0.0565

50 50 1 1 -1 0 0.9977 0.9977 0.9987 0.0502 0.0501 0.0641

75 50 1 1 -1 0 0.9995 0.9996 0.9998 0.0499 0.0498 0.0671

100 50 1 1 -1 0 0.9999 0.9999 1.0000 0.0501 0.0499 0.0693

10 10 2 1 -2 0 0.7344 0.7163 0.7167 0.0624 0.0586 0.0690

15 10 2 1 -2 0 0.7875 0.8507 0.8279 0.0370 0.0539 0.0608

20 10 2 1 -2 0 0.8246 0.9179 0.8884 0.0241 0.0522 0.0535

20 20 2 1 -2 0 0.9512 0.9481 0.9606 0.0569 0.0548 0.0952

30 20 2 1 -2 0 0.9781 0.9892 0.9892 0.0322 0.0519 0.0828

40 20 2 1 -2 0 0.9892 0.9976 0.9968 0.0206 0.0512 0.0764

50 50 2 1 -2 0 0.9999 0.9999 1.0000 0.0527 0.0517 0.1535

75 50 2 1 -2 0 1.0000 1.0000 1.0000 0.0292 0.0511 0.1547

100 50 2 1 -2 0 1.0000 1.0000 1.0000 0.0180 0.0505 0.1538

10 10 2 1 -2 -1 0.3170 0.3055 0.3356 0.0610 0.0569 0.0682

15 10 2 1 -2 -1 0.3077 0.3758 0.3984 0.0354 0.0512 0.0584

20 10 2 1 -2 -1 0.2998 0.4335 0.4405 0.0229 0.0493 0.0500

20 20 2 1 -2 -1 0.5086 0.5004 0.6159 0.0562 0.0540 0.0945

30 20 2 1 -2 -1 0.5336 0.6242 0.7062 0.0313 0.0507 0.0799

40 20 2 1 -2 -1 0.5525 0.7129 0.7707 0.0199 0.0501 0.0720

50 50 2 1 -2 -1 0.8601 0.8577 0.9447 0.0527 0.0518 0.1505

75 50 2 1 -2 -1 0.9095 0.9461 0.9819 0.0287 0.0502 0.1471

100 50 2 1 -2 -1 0.9364 0.9780 0.9936 0.0180 0.0504 0.1458

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !42 52

Table 4.19: Type I error rates and empirical powers in Student’s t-test, Welch’s t-test and Mann-Whitney U test for two independent groups following SAS-normal distributions

Power Type I error

Student t-test

Welch t-test

Mann-Whitney

U testStudent

t-testWelch t-test

Mann-Whitney

U test

10 10 1.6153 1 1.5918 0 0.7531 0.7391 0.6761 0.0500 0.0486 0.0433

15 10 1.6153 1 1.5918 0 0.8253 0.8659 0.7792 0.0497 0.0493 0.0473

20 10 1.6153 1 1.5918 0 0.8639 0.9157 0.8298 0.0495 0.0501 0.0489

20 20 1.6153 1 1.5918 0 0.9811 0.9805 0.9508 0.0502 0.0498 0.0490

30 20 1.6153 1 1.5918 0 0.9932 0.9953 0.9778 0.0498 0.0498 0.0481

40 20 1.6153 1 1.5918 0 0.9967 0.9983 0.9875 0.0499 0.0499 0.0484

50 50 1.6153 1 1.5918 0 1.0000 1.0000 1.0000 0.0499 0.0499 0.0497

75 50 1.6153 1 1.5918 0 1.0000 1.0000 1.0000 0.0499 0.0498 0.0493

100 50 1.6153 1 1.5918 0 1.0000 1.0000 1.0000 0.0502 0.0501 0.0495

10 10 1.1473 1 -0.7058 0 0.2626 0.2565 0.2138 0.0498 0.0483 0.0430

15 10 1.1473 1 -0.7058 0 0.3097 0.3298 0.2696 0.0496 0.0492 0.0473

20 10 1.1473 1 -0.7058 0 0.3412 0.3726 0.3028 0.0497 0.0501 0.0489

20 20 1.1473 1 -0.7058 0 0.5225 0.5206 0.4343 0.0502 0.0498 0.0494

30 20 1.1473 1 -0.7058 0 0.5996 0.6174 0.5034 0.0501 0.0500 0.0480

40 20 1.1473 1 -0.7058 0 0.6468 0.6722 0.5508 0.0501 0.0501 0.0486

50 50 1.1473 1 -0.7058 0 0.9120 0.9119 0.8261 0.0498 0.0498 0.0494

75 50 1.1473 1 -0.7058 0 0.9526 0.9557 0.8885 0.0500 0.0499 0.0494

100 50 1.1473 1 -0.7058 0 0.9692 0.9726 0.9183 0.0500 0.0501 0.0493

10 10 1.6153 1.1473 -1.5918 -0.7058 0.2443 0.2321 0.2138 0.0491 0.0465 0.0431

15 10 1.6153 1.1473 -1.5918 -0.7058 0.2718 0.3322 0.2701 0.0495 0.0491 0.0478

20 10 1.6153 1.1473 -1.5918 -0.7058 0.2895 0.3914 0.3027 0.0495 0.0513 0.0491

20 20 1.6153 1.1473 -1.5918 -0.7058 0.4966 0.4923 0.4360 0.0494 0.0488 0.0491

30 20 1.6153 1.1473 -1.5918 -0.7058 0.5578 0.6085 0.5037 0.0500 0.0500 0.0485

40 20 1.6153 1.1473 -1.5918 -0.7058 0.5960 0.6722 0.5499 0.0494 0.0506 0.0484

50 50 1.6153 1.1473 -1.5918 -0.7058 0.8956 0.8953 0.8260 0.0494 0.0493 0.0493

75 50 1.6153 1.1473 -1.5918 -0.7058 0.9398 0.9492 0.8885 0.0499 0.0500 0.0495

100 50 1.6153 1.1473 -1.5918 -0.7058 0.9586 0.9693 0.9184 0.0503 0.0508 0.0497

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !43 52

Table 4.20: Type I error rates and powers in Student’s t-tests, Welch’s t-tests and Mann-Whitney U test (Gamma distributions)

Power Type I error

Student t-test

Welch t-test

Mann-Whitney

U testStudent

t-testWelch t-test

Mann-Whitney

U test

10 10 2 1.4142 1 0.7071 0.4865 0.4568 0.4431 0.0848 0.0810 0.1511

15 10 2 1.4142 1 0.7071 0.5300 0.6353 0.5404 0.0614 0.0598 0.1652

20 10 2 1.4142 1 0.7071 0.5600 0.7262 0.5976 0.0463 0.0502 0.1719

20 20 2 1.4142 1 0.7071 0.8331 0.8276 0.7796 0.0743 0.0729 0.2827

30 20 2 1.4142 1 0.7071 0.8860 0.9202 0.8561 0.0490 0.0589 0.3127

40 20 2 1.4142 1 0.7071 0.9141 0.9532 0.8969 0.0348 0.0525 0.3369

50 50 2 1.4142 1 0.7071 0.9977 0.9977 0.9927 0.0627 0.0621 0.5873

75 50 2 1.4142 1 0.7071 0.9996 0.9997 0.9983 0.0376 0.0548 0.6764

100 50 2 1.4142 1 0.7071 0.9998 0.9999 0.9994 0.0252 0.0516 0.7344

10 10 2 1.4142 3 1.0000 0.4583 0.4517 0.4871 0.0547 0.0525 0.0524

15 10 2 1.4142 3 1.0000 0.4900 0.5253 0.5883 0.0415 0.0504 0.0509

20 10 2 1.4142 3 1.0000 0.5093 0.5769 0.6501 0.0332 0.0499 0.0481

20 20 2 1.4142 3 1.0000 0.6983 0.6950 0.8177 0.0525 0.0517 0.0663

30 20 2 1.4142 3 1.0000 0.7552 0.7967 0.8928 0.0389 0.0504 0.0601

40 20 2 1.4142 3 1.0000 0.7919 0.8560 0.9320 0.0309 0.0499 0.0572

50 50 2 1.4142 3 1.0000 0.9663 0.9659 0.9958 0.0512 0.0509 0.0902

75 50 2 1.4142 3 1.0000 0.9862 0.9915 0.9994 0.0373 0.0503 0.0897

100 50 2 1.4142 3 1.0000 0.9933 0.9974 0.9999 0.0296 0.0501 0.0890

10 10 1 0.7071 3 1.0000 0.9941 0.9938 0.9931 0.0514 0.0490 0.0457

15 10 1 0.7071 3 1.0000 0.9987 0.9983 0.9986 0.0677 0.0505 0.0593

20 10 1 0.7071 3 1.0000 0.9996 0.9991 0.9995 0.0807 0.0514 0.0673

20 20 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0510 0.0500 0.0521

30 20 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0676 0.0503 0.0603

40 20 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0804 0.0507 0.0670

50 50 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0502 0.0498 0.0533

75 50 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0670 0.0500 0.0623

100 50 1 0.7071 3 1.0000 1.0000 1.0000 1.0000 0.0794 0.0498 0.0685

"n1 "σ2"σ1 "μ2"n2 "μ1

Page ! of !44 52

Note. The results are the numbers of observations required in the control group ( ! ), SSR is the sample size ratio

( ! ), with them sample size of the experimental group can be also calculated. SDR is the ratio of the

standard deviations ( ! ), effect size is calculated by Cohen’s ! for Student’s t-tests and Shieh’s ! for

Welch’s t-tests, Δ is the difference between two group means (rounded to 2 decimals).

Table 4.21: Required sample sizes in t-tests to achieve 80% power (under constant effect sizes)

SDR (SD1/SD2)

Effect size

SSR = 0.5 SSR = 1 SSR = 2

Student t-test Welch t-test Student t-test Welch t-test Student t-test Welch t-test

Δ n2 Δ n2 Δ n2 Δ n2 Δ n2 Δ n2

0.5

0.1 0.09 2356 0.15 525 0.08 1571 0.16 394 0.07 1178 0.18 264

0.3 0.26 263 0.45 60 0.24 176 0.47 45 0.21 132 0.55 31

0.5 0.43 96 0.75 23 0.40 64 0.79 18 0.35 48 0.92 13

0.8 0.69 39 1.20 10 0.63 26 1.26 10 0.56 20 1.47 10

1 0.87 25 1.50 10 0.79 17 1.58 10 0.70 13 1.84 10

1

0.1 0.10 2356 0.21 526 0.10 1571 0.20 394 0.10 1178 0.21 263

0.3 0.30 263 0.64 61 0.30 176 0.60 45 0.30 132 0.64 31

0.5 0.50 96 1.06 24 0.50 64 1.00 17 0.50 48 1.06 12

0.8 0.80 39 1.70 11 0.80 26 1.60 10 0.80 20 1.70 10

1 1.00 25 2.12 10 1.00 17 2.00 10 1.00 13 2.12 10

2

0.1 0.14 2356 0.37 527 0.16 1571 0.32 394 0.17 1178 0.30 263

0.3 0.42 263 1.10 62 0.47 176 0.95 45 0.52 132 0.90 30

0.5 0.71 96 1.84 25 0.79 64 1.58 18 0.87 48 1.50 12

0.8 1.13 39 2.94 12 1.26 26 2.53 10 1.39 20 2.40 10

1 1.40 25 3.67 10 1.58 17 3.16 10 1.74 13 3.00 10

n2

n1 : n2

σ1 : σ2 ds d

Page ! of !45 52

Note. The results are the numbers of observations required in the control group ( ! ), SSR is the sample size ratio

( ! ), with them sample size of the experimental group can be also calculated. SDR is the ratio of the

standard deviations ( ! ), effect size is calculated by Cohen’s ! for Student’s t-tests and Shieh’s ! for

Welch’s t-tests, Δ is the difference between two group means (rounded to 2 decimals).

Table 4.22: Required sample sizes in t-tests to achieve 80% power (under constant mean difference)

SDR (SD1/SD2) Δ

SSR = 0.5 SSR = 1 SSR = 2

Student t-test

Welch t-test

Student t-test

Welch t-test

Student t-test

Welch t-test

0.5

0.1 1768 1179 983 983 590 885

0.3 198 133 110 111 66 100

0.5 73 49 41 41 25 37

0.8 30 20 17 17 10 16

1 20 14 11 12 10 11

1

0.1 2356 2357 1571 1571 1178 1179

0.3 263 264 176 176 132 132

0.5 96 97 64 64 48 49

0.8 39 39 26 26 20 20

1 25 26 17 17 13 13

2

0.1 4711 7068 3926 3926 3533 2356

0.3 525 788 438 438 394 263

0.5 190 286 158 159 143 95

0.8 75 114 63 63 56 38

1 49 74 41 41 37 25

n2

n1 : n2

σ1 : σ2 ds d

Page ! of !46 52

Syntex 1 R-functions to obtain confidence intervals

#------------------------------------------------------------------------------------------------- # Obtain confidence limits for Cohen’s ! following the noncentral t-distribution #------------------------------------------------------------------------------------------------- Par.CL <- function(Group.1, Group.2){ n1 <- length(Group.1) n2 <- length(Group.2) # perform two-sample t-test assuming equal variance (same assumptions of cohen's d) t <- t.test(Group.1, Group.2, alternative = "two.sided", var.equal = TRUE)$statistic

# find limits of lambda at pt = 0.025 or 0.975 lambda <- 0.01 if (pt(q=t, df=n1+n2-2, ncp = lambda) > 0.025) { while (pt(q = t, df = n1+n2-2, ncp = lambda) - 0.025 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q=t, df=n1+n2-2, ncp = lambda) < 0.025) { while (0.025 - pt(q = t, df = n1+n2-2, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.1 <- lambda delta.1 <- lambda.1*sqrt(1/n1+1/n2) lambda <- 0.01 if (pt(q=t, df=n1+n2-2, ncp = lambda) > 0.975) { while (pt(q = t, df = n1+n2-2, ncp = lambda) - 0.975 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q=t, df=n1+n2-2, ncp = lambda) < 0.975) { while (0.975 - pt(q = t, df = n1+n2-2, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.2 <- lambda delta.2 <- lambda.2*sqrt(1/n1+1/n2) delta.low <- min(delta.1, delta.2) delta.upp <- max(delta.1, delta.2) result <- c(delta.low, delta.upp) return(result) }

ds

Page ! of !47 52

#--------------------------------------------------------------------------- # Obtain Welch confidence limits following Shieh's procedure #--------------------------------------------------------------------------- Shieh.CL <- function(Group.1, Group.2){ n1 <- length(Group.1) n2 <- length(Group.2) s1 <- sd(Group.1) s2 <- sd(Group.2) # perform two-sample Welch t-test (same assumptions of Shieh's d) V0 <- t.test(Group.1, Group.2, alternative = "two.sided", var.equal = FALSE)$statistic # sample estimates for degrees of freedom nu of noncentral t-distribution nu <- (s1^2/n1 + s2^2/n2)^2 / ((s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1)) # obtain the point estimator of the standardized mean difference G.nu <- gamma(nu/2)/(sqrt((n1+n2)*nu/2)*gamma((nu-1)/2)) delta.nu <- G.nu*V0 # find limits of lambda at pt = 0.025 or 0.975 lambda <- 0.01 if (pt(q=V0, df=nu, ncp = lambda) > 0.025) { while (pt(q = V0, df = nu, ncp = lambda) - 0.025 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q=V0, df= nu, ncp = lambda) < 0.025) { while (0.025 - pt(q = V0, df = nu, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.1 <- lambda delta.1 <- lambda.1/sqrt(n1+n2)

lambda <- 0.01 if (pt(q=V0, df=nu, ncp = lambda) > 0.975) { while (pt(q = V0, df = nu, ncp = lambda) - 0.975 > 0.0001) { lambda <- lambda + 0.01 }} else if (pt(q = V0, df = nu, ncp = lambda) < 0.975) { while (0.975 - pt(q = V0, df = nu, ncp = lambda) > 0.0001){ lambda <- lambda - 0.01 }} else {lambda <- lambda} lambda.2 <- lambda delta.2 <- lambda.2/sqrt(n1+n2) delta.low <- min(delta.1, delta.2) delta.upp <- max(delta.1, delta.2) result <- c(delta.low, delta.upp, delta.nu) return(result) }

Page ! of !48 52

#--------------------------------------------------------------------------------------------- # Obtain confidence limits following the percentile and the BS BCa methods #---------------------------------------------------------------------------------------------

# ---- Using Cohen's ds as point estimator ——

BCa.CL.ds <- function(Group.1, Group.2, B=1000, alpha=0.05){ # B <- 1000 # Number of bootstrap samples/replications # alpha <- 0.05 # Type I error rate (or 1-alpha as confidence coverage) n1 <- length(Group.1) n2 <- length(Group.2) Bootstrap.Results <- matrix(NA, B, 1) for (b in 1:B) { Bootstrap.Results[b,1] <- Cohens.d(sample(Group.1, size=n1, replace=TRUE), sample(Group.2, size=n2, replace=TRUE)) } Jackknife.Results <- matrix(NA, n1+n2, 1) Marker.1 <- seq(1,n1,1) for (sample.1 in 1:n1) { Jackknife.Results[sample.1, 1] <- Cohens.d(Group.1[Marker.1[-sample.1]], Group.2) } Marker.2 <- seq(1,n2,1) for (sample.2 in 1:n2) { Jackknife.Results[n1+sample.2, 1] <- Cohens.d(Group.1, Group.2[Marker.2[-sample.2]]) } Mean.Jackknife <- mean(Jackknife.Results) a <- (sum((Mean.Jackknife-Jackknife.Results)^3))/ (6*sum((Mean.Jackknife-Jackknife.Results)^2)^(3/2)) z0 <- qnorm(sum(Bootstrap.Results < Cohens.d(Group.1, Group.2))/B) CI.Low.BCa <- pnorm(z0 + (z0+qnorm(alpha/2))/(1-a*(z0+qnorm(alpha/2)))) CI.Up.BCa <- pnorm(z0 + (z0+qnorm(1-alpha/2))/(1-a*(z0+qnorm(1-alpha/2)))) Percentile.Confidence.Limits <- c(quantile(Bootstrap.Results, alpha/2), quantile(Bootstrap.Results, 1-alpha/2)) BCa.Confidence.Limits <- c(quantile(Bootstrap.Results, CI.Low.BCa), quantile(Bootstrap.Results, CI.Up.BCa)) return(BCa.Confidence.Limits) }

Page ! of !49 52

# ---- Using Hedge's ! as point estimator ----

BCa.CL.du <- function(Group.1, Group.2, B=1000, alpha=0.05) { # B <- 1000 # Number of bootstrap samples/replications # alpha <- 0.05 # Type I error rate (or 1-alpha as confidence coverage) n1 <- length(Group.1) n2 <- length(Group.2) Bootstrap.Results <- matrix(NA, B, 1) for (b in 1:B) { Bootstrap.Results[b,1] <- Unbiased.d(sample(Group.1, size=n1, replace=TRUE), sample(Group.2, size=n2, replace=TRUE)) } Jackknife.Results <- matrix(NA, n1+n2, 1) Marker.1 <- seq(1,n1,1) for (sample.1 in 1:n1) { Jackknife.Results[sample.1, 1] <- Unbiased.d(Group.1[Marker.1[-sample.1]], Group.2) } Marker.2 <- seq(1,n2,1) for (sample.2 in 1:n2) { Jackknife.Results[n1+sample.2, 1] <- Unbiased.d(Group.1, Group.2[Marker.2[-sample.2]]) } Mean.Jackknife <- mean(Jackknife.Results) a <- (sum((Mean.Jackknife-Jackknife.Results)^3))/ (6*sum((Mean.Jackknife-Jackknife.Results)^2)^(3/2)) z0 <- qnorm(sum(Bootstrap.Results < Unbiased.d(Group.1, Group.2))/B) CI.Low.BCa <- pnorm(z0 + (z0+qnorm(alpha/2))/(1-a*(z0+qnorm(alpha/2)))) CI.Up.BCa <- pnorm(z0 + (z0+qnorm(1-alpha/2))/(1-a*(z0+qnorm(1-alpha/2)))) Percentile.Confidence.Limits <- c(quantile(Bootstrap.Results, alpha/2), quantile(Bootstrap.Results, 1-alpha/2)) BCa.Confidence.Limits <- c(quantile(Bootstrap.Results, CI.Low.BCa), quantile(Bootstrap.Results, CI.Up.BCa)) return(c(Percentile.Confidence.Limits,BCa.Confidence.Limits)) }

gs

Page ! of !50 52

Syntex 2 R-functions to calculate the required sample size

# Obtain the theoretical power of Student’s t-test and Welch’s t-test t.power <- function(n1, n2, mu1, mu2, sigma1, sigma2, alpha=0.05){ t.nu <- n1+n2-2 t.critic <- qt(1-alpha/2, df=t.nu) sd.pooled <- sqrt( ((n1-1)*sigma1^2 + (n2-1)*sigma2^2) / (n1+n2-2) ) t.ncp <- abs((mu1-mu2) / (sqrt(1/n1+1/n2)*sd.pooled)) t.power <- 1-pt(q=t.critic,df=t.nu,ncp=t.ncp) welch.nu <- (sigma1^2/n1+sigma2^2/n2)^2 / ((sigma1^2/n1)^2/(n1-1)+(sigma2^2/n2)^2/(n2-1)) welch.critic <- qt(1-alpha/2,df=welch.nu) welch.ncp <- abs((mu1-mu2) / sqrt(sigma1^2/n1 + sigma2^2/n2)) welch.power <- 1-pt(q=welch.critic,df=welch.nu,ncp=welch.ncp) return(c(t.power, welch.power)) }

# Estimate delta for t-test and Welch t-test t.mu1 <- function(d,mu2=0,sigma1,sigma2=1,n1,n2){ object <- d * sqrt( ((n1-1)*sigma1^2 + (n2-1)*sigma2^2) / (n1+n2-2) ) + mu2 return(object) }

welch.mu1 <- function(d,mu2=0,sigma1,sigma2=1,n1,n2){ object <- sqrt(n1+n2) * d * sqrt(sigma1^2/n1 + sigma2^2/n2) + mu2 return(object) }

# Find the minimum sample sizes given specific effect sizes samplesize <- function(d,power,ssr,n2,sigma1,sigma2=1,mu2=0) { n0 <- n2 # back up for welch calculation n1 <- round(n2 * ssr) mu1 <- t.mu1(d=d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] while (t.pow < power) { n2 <- n2+1 n1 <- n2 * ssr mu1 <- t.mu1(d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] }

Page ! of !51 52

t.mu1 <- mu1 t.n <- n2 n2 <- n0 n1 <- round(n2 * ssr) mu1 <- welch.mu1(d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] while (welch.pow < power) { n2 <- n2+1 n1 <- n2 * ssr mu1 <- welch.mu1(d,sigma1=sigma1,sigma2=sigma2,n1=n1,n2=n2) welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] } welch.mu1 <- mu1 welch.n <- n2 return(c(t.mu1, t.n, welch.mu1, welch.n)) }

# Obtain the required sample size to detect the same mean difference samplesize.samemu <- function(diff, power, ssr, n2, sigma1, sigma2=1, mu2=0){ n0 <- n2 # back up for welch calculation n1 <- round(n2 * ssr) mu1 <- diff + mu2 t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] while (t.pow < power) { n2 <- n2+1 n1 <- n2 * ssr t.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[1] } t.n <- n2 n2 <- n0 n1 <- round(n2 * ssr) welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] while (welch.pow < power) { n2 <- n2+1 n1 <- n2 * ssr welch.pow <- t.power(n1,n2,mu1,mu2,sigma1,sigma2)[2] } welch.n <- n2

return(c(t.n, welch.n)) }

Page ! of !52 52

lib.ugent.be€¦ · 2 introduction 2.1 effect size - background knowledge effect size measures are...

Documents