a simulation study to examine the bias of some sample
Post on 05-Oct-2021
1 Views
Preview:
TRANSCRIPT
Applied Mathematical Sciences, Vol. 15, 2021, no. 4, 189 - 200
HIKARI Ltd, www.m-hikari.com
https://doi.org/10.12988/ams.2021.914276
A Simulation Study to Examine the Bias of
Some Sample Measures of Skewness
Nana Kena Frempong
Department of Statistics and Actuarial Science
Kwame Nkrumah University of Science and Technology-Kumasi, Ghana
Ransmond Opoku Berchie
Department of Statistics and Actuarial Science
Kwame Nkrumah University of Science and Technology-Kumasi, Ghana
Richard Baidoo
Department of Statistics and Actuarial Science
Kwame Nkrumah University of Science and Technology-Kumasi, Ghana
Benjamin Abijah
Department of Statistics and Actuarial Science
Kwame Nkrumah University of Science and Technology-Kumasi, Ghana
Osei Yaa Oforiwaa-Amanfo
Department of Statistics and Actuarial Science
Kwame Nkrumah University of Science and Technology-Kumasi, Ghana
This article is distributed under the Creative Commons by-nc-nd Attribution License.
Copyright © 2021 Hikari Ltd.
Abstract
In the last two decades, a number of modified and new measures of skewness
have been introduced for population data. Tajuddin (2016) attempted to examine
the performance of different measures of skewness. In this paper, we seek to
examine the performance of Holgersson, Pearson, Classical and Tajuddin
Skewness measures based on bias by a simulation study. From the Monte Carlo
simulation using the Inverse transform method, the sample skewness measure
190 Nana Kena Frempong et al.
proposed by Tajuddin (1999) performs better based on least bias on Weibull
model. The Classical measure can be used to compute skewness of any income
and wealth data with sample size below 100. And for larger income dataset (100
and above), the Pearson measure can be used to estimate skewness with minimal
bias.
Keywords: Skewness, Monte Carlo, Bias
1 Introduction
During the late 19th century, it was an extensive practice among statisticians to
treat any frequency distribution as normal. Histogram data displaying
plurimodality were typically fitted with normal mixtures, skewness was removed
at the outset by transformations to normality. Many inference about the population
distribution in modern times rely on non-normal response that means data used do
not need transformations to satisfy normality assumptions. The Classical measure
of skewness, “γ = “, measured by the standardized third moment, the Pearson
measure (1905), “ξ”, which were proposed some past decades and used in many
applications both have been criticized in literature. Tajuddin (1999) argued that
the Pearson measure is not a reliable measure in the presence of lesser skewed
distributions. G. Brys, M. Hubert, and A. Struyf (2003), highlighted the Classical
measure may be strongly affected by just a single outlier. Even though Doane and
Seward (2011) have recommended the use of Pearson measure “ξ” over other
measures since it is the only way to measure skewness when we do not have the
original sample data and easy to implement. Holgersson (2010) established that
the Pearson measure does not uniquely determines symmetry for general
distribution, hence proposed a modified skewness measure “ ”, which is a
function of both classical and Pearson. A recent paper by Tajuddin (2016)
examined the performance of some sample measures of skewness for a number of
distributions using a simulation study. Tajuddin criticized the Classical, Pearson,
Holgersson measures alongside other measures to suffer in the presence of
outliers. However, the basis of comparison using sensitivity of outliers has a
restrictive space and cannot be overly accepted. The use of bias, efficiency, etc. of
recommending a particular measure was not evident in his paper. The focus is on
which of these sample measures have minimal bias considering different
distributions. The objective of this paper is to estimate skewness and examine the
performance of the sample measures using bias estimates under different skewed
distributions using a simulation study. In section 2, statistical techniques used in
analyzing and implementing the simulations are described. Section 3 presents the
results and detailed discussion of the outputs from the simulations. In section 4,
we present the conclusion of the paper.
A simulation study to examine the bias of sample measures of skewness 191
2 Methodology
Skewness is a measure of the asymmetry of the probability distribution of a
real-valued random variable about its mean. For the purpose of this study, we
examine the bias of Classical, Pearson (1905), Tajuddin (1999) and Holgersson
(2010) measures introduced extensively by Tajuddin (2016).
2.1 Review of Sample Measures of Skewness
The following section reviews the existing sample measure of skewness. The
theoretical definitions and the sample definitions are presented.
2.1.1 Classical Measure
Theoretically, the Classical Measure is defined: 3 3
3
XE
(2.1)
where µ is mean, is the third central moment, and σ is standard deviation.
The sample version of (2.1) designated by “C” is defined:
3
1
1.52
1
n
i
i
n
i
i
n X X
C
X X
(2.2)
Where X is the sample mean with size n.
2.1.2 Pearson Measure
The Pearson Measure denoted theoretically by ξ is defined as:
m
(2.3)
where m is the population median. The sample version of (2.3) denoted by “P” is
defined as;
X medianP
s
(2.4)
Where X is the sample mean and s is an unbiased estimator of .
2.1.3 Holgersson Measure
The theoretical measure of skewness suggested by Holgersson is defined as:
33E X m (2.5)
with median m and σ is the population standard deviation. The sample version of
(2.5) denoted by “H” is defined as:
192 Nana Kena Frempong et al.
3
1
3ˆ
n
i
i
X median
Hn
, where (2.6)
2.1.4 Tajuddin Measure
Tajuddin’s measure is defined as;
= 2F(µ)-1 (2.7)
The sample version of (2.7) is obtained after removing the median value from the
sample and then considering;
T = (2.8)
2.2 Bias
The bias of an estimator is the difference between the expected value of the
estimator and the true value of the parameter being estimated. The theoretical bias
of an estimator (relative to its parameter ) is defined as;
( ) ( )n nBias T E T (2.9)
The sample bias of an estimator is given as;
Bias [G] =
Where the true parameter and G is the sample measure. We use delta as the
measure of deviations away from the true parameter value. In these problems, the
shape parameters of the two distributions were varied for the simulations.
Delta (δ) = |a- | where is the initial shape parameter and a is the varying
shape parameter of the distribution.
2.3 Simulation Study
To implement the simulation, we considered two positively skewed continuous
distributions. The choice of these distributions is because of broader application in
the area of medicine, engineering and economics. We illustrate the techniques
with the Weibull and Pareto distributions. Kalbfeisch (1985) showed extensively
the properties of these two distributions.
The PDF, CDF and Moments of the Weibull are shown below:
, x ≥ 0
A simulation study to examine the bias of sample measures of skewness 193
, x ≥ 0
,
Where a > 0 is the shape parameter and λ > 0 is the scale parameter of the
distribution.
The Moment Generating Function is given as;
, for λ = 1,2, 3, …
The PDF, CDF and moments of the Pareto are shown below:
, x ≥ 0
, for x ≥ σ
where k is the shape parameter and σ is the scale parameter.
The coefficient of skewness is given as;
, for k > 3
The Moment Generating Function of the Pareto distribution is given as;
Random samples from the Weibull and Pareto distributions were generated using
the Monte Carlo simulation technique. Specifically, the Inverse Transform
method that generates random samples based on the inverse CDF of the uniform
distribution. The implementation of the simulations was done in RStudio version
1.0.143. One thousand sample of sizes n = 20, 50 and 100 are obtained from the
Weibull and Pareto distributions. For each sample, the four sample measures C, P,
H, T are computed with the average measures computed for 1000 samples. The
estimated values of the skewness measures are compared with the corresponding
population values to estimate the bias.
3 Results
In this section, we present detailed discussion of simulation studies based on the
findings of the simulated data.
3.1 Skewness of Weibull Distribution
Table 3.1 shows the simulations for different skewness measures for different
sample sizes over varying shape parameter of the Weibull distribution. For each
value of a (the shape parameter) with fixed scale parameter (λ = 1), the true popu-
194 Nana Kena Frempong et al.
lation skewness is shown as True Value in the tabulated results.
Table 3.1: Estimated Average Skewness of Weibull (1,a) with different Sample
Size a n H C P T
0.1 20 50
100
True Value
4.6047 6.6102
8.9573
69899.9265
3.7338 6.0314
8.5319
69899.9195
0.2747 0.1883
0.1400
0.0023
0.813 0.884
0.916
0.9784
0.2 20 50
100
True Value
4.4008 6.0518
7.7589
190.4922
3.3066 5.2635
7.1292
190.3028
0.3392 0.2533
0.2052
0.0630
0.719 0.781
0.807
0.8522
0.4 20 50
100
True Value
3.9397 5.0067
5.7935
13.2023
2.6069 3.8502
4.7059
12.3402
0.4066 0.3634
0.3454
0.2801
0.538 0.574
0.590
0.6029
0.5 20 50
100
True Value
3.6317 4.5031
5.0842
9.1084
2.3207 3.2712
3.9085
8.0498
0.4005 0.3853
0.3715
0.3398
0.464 0.496
0.505
0.5138
1 20 50
100
True Value
2.2243 2.5412
2.7325
6.94945
1.3206 1.6140
1.7792
6.0000
0.2814 0.2952
0.3055
0.3069
0.239 0.256
0.263
0.2642
2 20 50
100
True Value
0.8201 0.9108
0.9352
13.7208
0.4728 0.5651
0.5904
13.3717
0.1010 0.1124
0.1132
0.1159
0.082 0.080
0.089
0.0881
Key observations from Table 3.1;
For all the measures except P, the average skewness values for a ≤ 2 are
no more than the True skewness values. The measure P has its True values
been larger than the estimated skewness values at a ≥ 1.
The estimated skewness values for all the measures with the exception of
P, generally increases and gets closer to the population values as the
sample size increases.
For C and H measures, at a < 1, the True values decreases exponentially
with an increase in the values of a. The True values however increases at a
= 2.
The P measure behaves oddly, its skewness values for both the sample and
population measures increases with increase in a, for a < 1.
A simulation study to examine the bias of sample measures of skewness 195
Generally, the estimated values for measure T increases as the sample size
increases for a < 2. At a = 2, the estimated value decreases for the sample
size 50.
3.1.1 Absolute Bias of Weibull Distribution
The absolute bias associated with each sample estimate is computed and displayed
in Table 3.2. The values were computed by subtracting the True Values in Table
3.1 from the sample estimates in the same table. The absolute of these results are
taken and presented in Table 3.2.
The following observations can be made from Table 3.2.
The absolute bias values decrease as a increases up to 1 for H, C and T
measures for all sample sizes. However, at a = 2 the absolute bias for H
and C increases. This is as a result of the True value of these measures
increasing at a = 2 from Table 3.1.
Table 3.2: Absolute Bias of Sample Measures for Weibull (1, a) with different
Sample sizes. a n H C P T
0.1 20 50
100
69895.3218 69893.3163
69890.9692
69896.1857 69893.8881
69891.3876
0.2724 0.1860
0.1377
0.1654 0.0944
0.0624
0.2 20 50
100
186.0194 184.4404
182.7333
186.9962 185.0393
183.1736
0.2762 0.1903
0.1422
0.1332 0.0712
0.0452
0.4 20 50
100
9.2626 8.1956
7.4088
9.7333 8.4900
7.6343
0.1265 0.0833
0.0653
0.0649 0.0289
0.0129
0.5 20 50
100
5.4767 4.6053
4.0242
5.7291 4.7786
4.1413
0.0607 0.0455
0.0317
0.0498 0.0178
0.0088
1 20 50
100
4.7252 4.4083
4.2170
4.6794 4.3860
4.2208
0.0255 0.0117
0.0014
0.0252 0.0082
0.0012
2 20 50
100
12.9007 12.8100
12.7856
12.8989 12.8066
12.7813
0.0149 0.0035
0.0027
0.0061 0.0081
0.0009
However, for P measure the absolute bias values marginally increased
from a = 0.1 to a = 0.2 and then decreases as a increases at a > 0.2 over
each sample size.
As the sample size increases, the absolute bias decrease at each level of a
for all measures.
196 Nana Kena Frempong et al.
From figure 1 (upper panel), it can be observed that;
The bias of P statistic shows a steady decline from δ ≤ 0.4, then rise at δ =
0.5. The bias declines sharply at δ = 1 for sample size 20 until a steady
increase as δ increases. Moreover, the bias gets closer to 0 at δ = 1.4 and δ
= 1.6 for sample size 50 and 100 respectively.
Bias of T decreases steadily for δ ≤ 0.4, and then gradually declines to 0 at
δ > 0.5.
From figure 1(lower panel), bias of C shows a sharp decrease from δ = 0.1
to δ = 0.2 and trails down to 0 as δ increases for all sample sizes.
Figure 1: Bias plots of sample skewness measures for Weibull
The bias of H decreases sharply from δ = 0.1 to δ = 0.2, there was also a
sharp decline from δ = 0.2 to δ = 0.3, then gradually increase for δ > 0.5
for all sample sizes.
3.2 Skewness of Pareto Distribution
The Pareto distribution is a skewed, heavy-tailed distribution that is usually used
to model the distribution of incomes and describe the allocation of wealth among
individuals in the theory of economics.
A simulation study to examine the bias of sample measures of skewness 197
Table 3.3 shows the simulated skewness for different skewness measures over
different sample sizes and different shape parameter of the Pareto distribution. For
each value of k (the shape parameter) with fixed scale parameter (σ = 1), the true
population skewness is shown as True Value in the tabulated results.
Table 3.3: Estimated Average Skewness of Pareto (1, k) with different Sample
Sizes k n H C P
4.1
20 50
100
True Value
4.607073 6.821715
9.422526
2.9456675
3.725002 6.286283
9.049177
9.149889
0.2774955 0.1745165
0.1230689
-1.313202
4.2
20 50
100
True Value
4.633934 6.810550
9.446849
1.8903061
3.760779 6.277746
9.076998
8.822819
0.2750581 0.1736626
0.1219333
-1.398706
4.4
20 50
100
True Value
4.627718 6.879753
9.482156
-0.2830036
3.763452 6.358331
9.114652
8.315869
0.2724791 0.1700825
0.1211623
-1.571824
4.5
20 50
100
True Value
4.638873 6.883433
9.476721
-1.4307318
3.780531 6.363539
9.110023
8.116099
0.2706314 0.1696082
0.1209194
-1.659337
5.0
20 50
100
True Value
4.661283 6.855861
9.637297
-8.2028360
3.826743 6.338895
9.286892
7.436128
0.2636517 0.1687025
0.1156306
-2.104795
6.0
20 50
100
True Value
4.665908 7.010276
9.641479
-29.9231864
3.844712 6.523988
9.294849
6.804138
0.2597022 0.1589670
0.1143992
-3.024070
Table 3.3 presents the estimated sample skewness values and True Values of the
different measures of skewness, precisely the Holgersson measure (H), Pearson
measure (P) and the Classical measure (C). The Tajudin measure (T) was not
considered due to convergence issues with the simulations.
Key observations from Table 3.3;
For the H measure, as n increases the sample estimates tend to increase
and get farther from the True Values. The True values also decreases and
take negative values as the shape parameter (k) increases.
198 Nana Kena Frempong et al.
At k = 4.1 the C measure values increase and approach the True Value as
the sample size (n) increases from 20 to 100. However, for k ≥ 4.2, the
estimated values for sample size 100 tends to be larger than the True
Values.
For P Measure, the True Values decreases as k increases. However, the
estimated values decrease and approaches the True values as the sample
size increases at each k.
3.3.1 Absolute Bias of Pareto Distribution
The absolute bias associated with each sample estimate is computed and displayed
in Table 3.4. The values were computed by subtracting the True Values in Table
3.3 from the sample estimates.
Table 3.4: Absolute Bias of Pareto (1, k) with different Sample Sizes k n H C P
4.1 20 50
100
1.661406 3.876047
6.476858
5.424887 2.8636059
0.1007119
1.590697 1.487718
1.436271
4.2 20 50
100
2.743628 4.920244
7.556543
5.062040 2.5450734
0.2541793
1.673764 1.572368
1.520639
4.4 20 50
100
4.910721 7.162756
9.765160
4.552417 1.9575387
0.7987829
1.844303 1.741907
1.692987
4.5 20 50
100
6.069605 8.314165
10.907453
4.335568 1.7525595
0.9939242
1.929969 1.828945
1.780257
5.0 20 50
100
12.864119 15.058697
17.840133
3.609385 1.0972331
1.8507636
2.368446 2.273497
2.220425
6.0 20 50
100
34.589094 36.933462
39.564665
2.959426 0.2801503
2.4907107
3.283772 3.183037
3.138469
The following observations can be made from Table 3.4.
For H measure, the absolute bias values increase as k increases for each
sample size. Also, the absolute bias gets larger as the sample size increases
from 20 to 100 for all k.
For C measure, the absolute bias decrease as k increases for each
individual sample size. Also, the absolute bias gets smaller as the sample
size increases from 20 to 100 for k ≤ 4.5. However, for k > 4.5, the
absolute bias tends to increase as n increases from 50 to 100.
A simulation study to examine the bias of sample measures of skewness 199
For P measure, the absolute bias values increase as k increases for each
individual sample size. Again, the absolute bias decreases marginally as
the sample size increases from 20 to 100 for each k.
Figure 2: Bias Plots of Sample Measures for Pareto (Type I) Distribution
From figure 2, it can be observed that;
Changes in the shape parameter results to an increase in the bias of the
H measure for Pareto distribution. Also, the bias gets larger as the
sample size increases.
As the change in the shape parameter increases, the bias of the C
measure decreases for sample sizes 20 and 50. However, for sample
size 100 the bias increases as change in the shape parameter increases.
The P measure has bias increasing as the change in shape parameter
increases. Also, the bias gets smaller as sample size increases.
4 Conclusion
4.1 Summary of Findings
The plot of bias of Tajuddin measure (T) portrays its adequacy for estimating
skewness of the Weibull distribution with minimal bias. We observed that bias of
200 Nana Kena Frempong et al.
T at each shape parameter gets closer to zero (0) as sample size increases. It shows
a positively skewed plot as expected of a Weibull distribution of shape parameter
a ≤ 2.6. The T measure seems to perform well when bias is considered.
In the case of the Pareto distribution (Type I), we observe that bias of all the three
measures were sensitive to sample size. The absolute bias is smaller for the C and
P measures as compared to the H measure. The Classical measure tends to
estimate skewness of the Pareto better for smaller samples (<100). However, the
Pearson measure performs well for moderate sample size (>100) and above even
though the bias increases as the shape parameter increases.
We conclude that, for any Weibull model, the Tajuddin measure performs better
than the other measures in estimating skewness. The Classical measure can be
used to compute skewness of any income and wealth data with sample size below
100. And for larger income dataset (100 and above), the Pearson measure can be
used to estimate skewness.
References
[1] Brys, G., Hubert, M. and Struyf, A., A comparison of some new measures of
skewness, in: Developments in Robust Statistics, ICORS 2001, eds. R. Dutter, P.
Filzmoser, U. Gather, and P.J. Rousseeuw, Springer-Verlag Heidelberg, 2003, 98-
113. https://doi.org/10.1007/978-3-642-57338-5_8
[2] Doane, D.P. and Seward, L.E., Measuring skewness: A forgotten statistic,
Journal of Statistics Education, 19 (2) (2011), 1-18.
https://doi.org/10.1080/10691898.2011.11889611
[3] Holgersson, H.E.T., A Modified Skewness Measure for Testing Asymmetry,
Communications in Statistics - Simulation and Computation, 39 (2010), 335-346.
https://doi.org/10.1080/03610910903453419
[4] Kalbfleisch, J.G., Probability and Statistical Inference, Vol. 2: Statistical
Inference, Springer, 1985. https://doi.org/10.1007/978-1-4612-1096-2
[5] Pearson, K., Contributions to the mathematical theory of evolution. II. Skew
variation in homogeneous material, Philos. Trans. Roy. Soc. Lond., A186 (1895),
343–414. https://doi.org/10.1098/rsta.1895.0010
[6] Tajuddin, I.H., A simple measure of skewness, Statistica Neerlandica, 50
(1996), 362-366. https://doi.org/10.1111/j.1467-9574.1996.tb01502.x
[7] Tajuddin, I.H., A comparison between two simple measures of skewness,
Journal of Applied Statistics, 26 (1999), 767-774.
https://doi.org/10.1080/02664769922205
[8] Tajuddin I.H., A simulation study of some sample measures of skewness, Pak.
J. Statist., 32 (1) (2016), 49-62.
Received: October 5, 2020; Published: April 7, 2021
top related