descriptive statistics -...

23
Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina Gleska Institute of Mathematics, PUT April 20, 2018

Upload: others

Post on 16-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

DESCRIPTIVE STATISTICS

Dr Alina Gleska

Institute of Mathematics, PUT

April 20, 2018

Page 2: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

1 Measures of statistical dispersion

Page 3: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Measures of statistical dispersion (the variation) - in addition tolocating the center of the observed values of the variable in thedata, another important aspect of a descriptive study of thevariable is numerically measuring the extent of variation aroundthe center. Two data sets of the same variable may exhibitsimilar positions of center but may be remarkably different withrespect to variability. We distinguish:

classical measures - depending on all observations;positional measures - depending on the position in theseries.

Page 4: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

All measures of dispersion we can divide with the respect toanother criterium:

absolute - they have the same units as variables;relative - they have no units (or are presented inpercentage).

If we want to compare the variables with different units we canuse only relative measures.

Page 5: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Positional measures of dispersion

The sample range of the variable is the difference between itsmaximum and minimum values in a data set:

R = xmax −xmin.

The very simple measure (advantage), but it depends onoutliers (disadvantage). Used only for the preparatory analysis.

Page 6: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Positional measures of dispersion

The sample interquartile range of the variable, denoted R0 (orIQR), is the difference between the first and third quartiles ofthe variable, that is,

R0 = Q3−Q1.

Roughly speaking, the R0 gives the range of the middle 50% ofthe observed values.

Page 7: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Positional measures of dispersion

The quartile deviation it is a half of the sample interquartilerange:

Q =R0

2=

Q3−Q1

2.

It informs how much is the average deviation of the middle50% of the observed values from the median and it is mainlyused when the distribution is highgly skewed.

Page 8: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Positional measures of dispersion

Positional typical range of variablesTypical units are those observations that belong to the interval(Me−Q,Me+Q).

RemarkWe have to distinguish the positional typical range of variablesfrom the mode or from the mode interval. There are twodifferent concepts.

Page 9: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

The average deviation allows to determine how much theconcrete observations differ from the arithmetic mean:

d =1n

n

∑i=1|xi −x |,

where xi is the value of the i-th observation, x is the arithmeticmean, and n denotes the total number of observations.

Page 10: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

Properties:the average deviation is always non-negative: d ≥ 0,d = 0 only if all observations are the same,the bigger average deviation, the higher diversity of thepopulation,the average deviation has the same units as variables.

Page 11: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

The variance is the average squared deviation from the mean.Its usefulness is limited because the units are squared and notthe same as the original data. The sample variance is denotedby

s2 =1n

n

∑i=1

(xi −x)2,

where xi is the value of the i-th observation, x is the arithmeticmean, and n – the total number of observations.

Page 12: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

We can modify the previous formula to the more convenientone:

s2 =(1

n

n

∑i=1

x2i

)−x2.

Properties:the variance is always non-negative: s2 ≥ 0,s2 = 0 only if all observations are the same,the bigger variance, the higher diversity of the population.

Page 13: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

The standard deviation determines how much observationsdiffer from the arithmetic mean:

s =√

s2.

The standard deviation has the same units as variables.

Page 14: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

Properties:the standard deviation is always non-negative: s ≥ 0,s = 0 only if all observations are the same,the bigger standard deviation, the higher diversity of thepopulation,the standard deviation is always greater than the averagedeviation s > d ,there is a relation between the standard deviation, theaverage deviation and the quartile deviation: s > d > Q.

Page 15: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

RemarkWe can calculate the variance using the different formula:

s2 =1

n−1

n

∑i=1

(xi −x)2.

Both methods are correct. We use n−1 in formula when ourdata come from the small sample (n < 30), and we want toestimate the variance of the whole big population. It can bemathematically proven that this variance, calculated with n−1is a better estimator ot the real variance.

Page 16: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Classical measures of dispersion

Classical typical range of variablesTypical units are those observations that belong to the interval(x−s,x +s).

RemarkWe have to distinguish the classical typical range of variablesfrom the mode or from the mode interval. There are twodifferent concepts.

Page 17: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Coefficients of dispersion

The quartile coefficient of dispersion is defined as:

VQ =Q

Me·100%.

Properties:VQ ≥ 0%,VQ = 0%, if there is no diversity in the population,the higher value of the quartile coefficient of dispersion, thehigher diversity of the population.

Page 18: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Coefficients of dispersion

Classification of the value of the quartile coefficient ofdispersion VQ:

0%−20% – a weak diversity,20%−40% – a moderate diversity,40%−60% – a strong diversity,more than 60% – a very strong diversity.

Page 19: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Coefficients of dispersion

The classical coefficient of variation (Vs) is defined as the ratioof the standard deviation s to the arithmetic mean x . It showsthe extent of variability in relation to the mean of the population.

Vs =sx·100%.

Properties:Vs ≥ 0%,Vs = 0%, if there is no diversity in the population,the higher value of the classical coefficient of variation, thehigher diversity of the population.

Page 20: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Coefficients of dispersion

Classification for the classical coefficient of variation Vs:0%−20% – a weak diversity,20%−40% – a moderate diversity,40%−60% – a strong diversity,more than 60% – a very strong diversity.

Page 21: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

The variance and the standard deviations for discrete groupedseries:

s2 =1n

k

∑i=1

(xi −x)2ni or s2 =(1

n

k

∑i=1

x2i ni

)−x2,

The variance and the standard deviations for continuousgrouped series (in the intervals):

s2 =1n

k

∑i=1

(x0i −x)2ni or s2 =

(1n

k

∑i=1

(x0i )

2ni

)−x2.

Page 22: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Gauss (normal) distribution.

Page 23: DESCRIPTIVE STATISTICS - alina.gleska.pracownik.put.poznan.plalina.gleska.pracownik.put.poznan.pl/lecture3.pdf · Measures of statistical dispersion DESCRIPTIVE STATISTICS Dr Alina

Measures of statistical dispersion

Chebyshev’s RuleAt least 1− 1

k2 of the data will lie within k standard deviations ofthe mean (in the intervals (x −ks,x +ks)), where k > 1. So inthe interval:

(x −2s,x +2s) lies at least 34 (75%),

(x −3s,x +3s) lies at least 89 (89%),

(x −4s,x +4s) lies at least 1516 (93,75%),

(x −5s,x +5s) lies at least 2425 (96%).

Example