0actual dispersion
TRANSCRIPT
-
8/12/2019 0actual Dispersion
1/53
Measures of
Dispersion
-
8/12/2019 0actual Dispersion
2/53
Defination
While measures of central tendencyindicate what value of avariable is (in one sense or other) average or central or
typical in a set of data, measures of dispersion(or variabilityorspread) indicate (in one sense or other) the extent to which theobserved values are spread out around that center how far
apart observed values typically are from each other or from someaverage value (in particular, the mean). Thus: if all cases have identical observed values (and thereby are
also identical to [any] average value), dispersion is zero; if most cases have observed values that are quite close
together (and thereby are also quite close to the averagevalue), dispersion is low (but greater than zero); and
if many cases have observed values that are quite far awayfrom many others (or from the average value), dispersion ishigh.
-
8/12/2019 0actual Dispersion
3/53
Measures of Dispersion
Synonym for variability
Often called spread or scatter
Indicator of consistency among adata set
Indicates how close data areclustered about a measure ofcentral tendency
-
8/12/2019 0actual Dispersion
4/53
Example
Consider the following data related to agedistribution of two groups A and B:
avg
Grp A 22 24 25 26 28 25
Grp B 8 15 20 28 54 25
-
8/12/2019 0actual Dispersion
5/53
Above mentioned two groups have thesame average i.e. 25 years, so we arelikely to conclude that the two groups are
similar. Wrong conclusion as the obs. in group A
are close to one another indicating thatpeople in this group are more or less of
the age 22 to 28 years.
-
8/12/2019 0actual Dispersion
6/53
While those in group B are widely dissimilar andhave greater variability of ages as it includes aperson who is 8 years old on one hand and aperson of age 54 on the other hand.
This means that central value does not give theclear indication of the pattern of distribution.
Measure of dispersion or variability gives us theinformation about the spread of the obs. In one
distribution
Here, dispersion of group B is more than that ofgroup A
-
8/12/2019 0actual Dispersion
7/53
Purpose of Measuring Variation
To test the reliability of an average
To serve as a basis for control ofvariability
To compare two or more series withregard to variability
To facilitate as a basis for furtherstatistical analysis.
-
8/12/2019 0actual Dispersion
8/53
Properties of a good measure of
variation
It should be simple to understand and easy to calculate.
It should be based on all observations.
It should be amenable to further algebraic treatment.
It should not be affected by extreme observations.
-
8/12/2019 0actual Dispersion
9/53
Measures of variation
Absolute measures
Range
Quartile deviation
Mean deviation
Standard deviation / variance
Lorenz curve
Relative measures
Coefficient of range
Coefficient of variation
Coefficient of quartile deviation
Coefficient of mean deviation
-
8/12/2019 0actual Dispersion
10/53
Absolute measures of variation
They are expressed in the same statistical unit inwhich the original data are given such as rupees,kg etc.
These values are used to compare the variationin two or more than two distributions providedthe variables are expressed in the same units andhave almost the same average value.
-
8/12/2019 0actual Dispersion
11/53
Relative measures of variation
Absolute measure of dispersion expressesvariation in the same units as the originaldata
To compare the variations of two differentseries, relative measure of standarddeviation is calculated.
-
8/12/2019 0actual Dispersion
12/53
Range
Range is the preliminary indicator of dispersion.
The (total or simple) range is the maximum(highest) value observed in the data [the value ofthe case at the 100th percentile] minus the
minimum(lowest) value observed in the data[the value of the case at the 0th percentile] That is, it is the distance or interval between the
values of these two most extreme cases.
Indicates how spread out the data are.
Open-ended distributions have no range bz nohighest or lowest values exist in an open-endedclass.
-
8/12/2019 0actual Dispersion
13/53
The Range
The rangeis defined as the differencebetween the largest score in the set ofdata and the smallest score in the set of
data, XL- XS What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
The largest score (XL) is 9; the smallest
score (XS) is 1; the range is XL- XS= 9 -1 = 8
-
8/12/2019 0actual Dispersion
14/53
Coefficient of scatter
Ratio of range
Coefficient of range =(Max- Min )/ (Max +Min) = Absolute range / Sum of the
extreme values
-
8/12/2019 0actual Dispersion
15/53
Dispersion Example
Number of minutes 20clients waited to see aconsultant
ConsultantX Y
05 15 11 12
12 03 10 13
04 19 11 1037 11 09 13
06 34 09 11
Consultant X:
Sees some clientsalmost immediately
Others wait over 1/2hour
Highly inconsistent
Consultant Y:
Clients wait about 10
minutes 9 minutes least wait and
13 minutes most
Highly consistent
-
8/12/2019 0actual Dispersion
16/53
Solution
1.Coefficient of range
=(Max- Min )/ (Max + Min)
= (37- 03 )/ (37 + 03) = 34/40 = 0.85
2. Coefficient of range
=(Max- Min )/ (Max + Min)= (13- 09 )/ (13 + 09) = 4/22 = 0.18
Consultant X is inconsistent and Consultant Y is consistent intheir job.
-
8/12/2019 0actual Dispersion
17/53
Uses
QUALITY CONTROL: The objective of quality control is to keep a check on the
quality of the product without 100% inspection
When statistical methods of quality control are used,control charts are prepared in which range plays animportant role.
The basic idea is that as long as manufactured productsconform to set standards (range), the productionprocess is assumed to be in control.
WHEATHER FORECASTS: This helps the general public to know as to what limits
the temperature is likely to vary on a particular day.
-
8/12/2019 0actual Dispersion
18/53
Quartile deviation
It measures the distance between thelowest and highest of the middle 50percent of the scores of distribution.
Q.D. is superior to range, as it is notbased on two extreme values but ratheron middle 50% observation.
It can be calculated from open-ended
classes. It is often used with skewed data as it is
insensitive to the extreme scores
-
8/12/2019 0actual Dispersion
19/53
Interquartile Range
Interquartile range = Q3Q1
Semi-interquartile range or quartile
deviationis defined as
= (Q3Q1)/2
Coefficient of quartile deviationis
= = (Q3Q1)/(Q3+ Q1)
-
8/12/2019 0actual Dispersion
20/53
When Q.D. is small then it describes highuniformity of central 50% observations.
High Q.D. means high variation among the
central observations.
-
8/12/2019 0actual Dispersion
21/53
Interquartile Range Example
The number of complaints received by themanager of a supermarket was recorded foreach of the last 10 working days.
21, 15, 18, 5, 10, 17, 21, 19, 25 & 28
Sorted data
5, 10, 15, 17, 18, 19, 21, 21, 25 & 28
nObservatioorQ
Q
nQ
rd375.2
4
11
4
1
1
1
1
nObservatioorQ
Q
nQ
th825.8
4
33
4
13
3
3
3
Interquartile range = 21
15 = 6 days
-
8/12/2019 0actual Dispersion
22/53
Calculating exactly:Q1
Using the formula:
16
X f CF
0 < 20 15 15
20 < 40 60 75
40
-
8/12/2019 0actual Dispersion
23/53
Q3
17
Third QuartileThis is in the group 20 < 40
Lower limit (l) is 20
Width of group (i) is 20
Frequency of group (f) is 60CF of previous group (F) is 15
X f CF
0 < 20 15 15
20 < 40 60 75
40
-
8/12/2019 0actual Dispersion
24/53
Interquart e Range an Coe c ent o Q.D.
Interquartile range = 40-23.333= 16.671
Semi-interquartile range or quartile
deviationis defined as= (Q3Q1)/2 = 16.67/2 =8.335
Coefficient of quartile deviationis= = (Q3Q1)/(Q3+ Q1) = 16.67/ 63.33
= 0.26
-
8/12/2019 0actual Dispersion
25/53
ExampleWeekly income (Rs.) no. of workers
below 1350 8
1350-1370 16
1370-1390 39
1390-1410 58
1410-1430 60
1430-1450 40
1450-1470 22
1470-1490 15
1490-1510 15
1510-1530 9
1530 and above 10
Use an appropriatemeasure toevaluate thevariation in thefollowing data:
-
8/12/2019 0actual Dispersion
26/53
Problems with quartile Deviation
It is not based on all the observations
Affected by sampling fluctuations
Not suitable for further algebraic treatment
-
8/12/2019 0actual Dispersion
27/53
Deviation Measures of Dispersion (cont.) The deviation from the mean for a representative case iis
xi- mean ofx.
If almost all of these deviations are small, dispersion is small. If many of these deviations are large, dispersion is large.
This suggests we could construct a measure Dof dispersionthat would simply be the average (mean) of all thedeviations.
But this will not work because, as we saw earlier, it is a
property of the mean that all deviation from it add up to
zero.
-
8/12/2019 0actual Dispersion
28/53
DeviationMeasuresof Dispersion: Example(cont.)
-
8/12/2019 0actual Dispersion
29/53
The Mean Deviation A practical way around this problem is simply to ignore the
fact that some deviations are negative while others are
positive by averaging the absolute valuesof the deviations. This measure (called the mean deviation) tells us the
average(mean) amount that the values for all casesdeviate(regardless of whether they are higher or lower)from the average(mean) value.
Indeed, the Mean Deviation is an intuitive, understand-able, and perfectly reasonable measure of dispersion, and itis occasionally used in research.
The mean deviation takes into consideration all of thevalues.
-
8/12/2019 0actual Dispersion
30/53
The Mean Deviation (cont.)
-
8/12/2019 0actual Dispersion
31/53
If the data are in the form of a frequencydistribution, the mean deviation can be calculatedusing the following formula:
Where: f= the frequency of an observation x
n = f= the sum of the frequencies
This measure is an improvement over theprevious two measures in the sense that itconsiders all observations of a data set.
Frequency Distribution Mean Deviation
f
xxfMD
_
||
-
8/12/2019 0actual Dispersion
32/53
Coefficient of mean deviation
Coefficient of mean deviation =
= Mean deviation
Mean
E l
-
8/12/2019 0actual Dispersion
33/53
Example
Find out the mean deviation for the following distribution of
demand for a bookQuantity
demanded(in unit)
Frequency fx |x-x| f|x-x|
6 4 24 17.6 70.4
12 7 84 11.6 81.2
18 10 180 5.6 56
24 18 432 0.4 7.2
30 12 360 6.4 76.8
36 7 254 12.4 86.8
42 2 84 18.4 36.8
total 60 fx = 1416 f|x-x| =415.2
mean =
1416/60=23.6
MD=
415.2/60=6.92
f
xxfMD
_
||
f
fxx_
x
-
8/12/2019 0actual Dispersion
34/53
Problems with Mean Deviation
Algebraic signs are ignored while takingthe deviations of the items.
Cannot be computed for distribution
with open end classes.
Not suitable for further mathematicaltreatment.
-
8/12/2019 0actual Dispersion
35/53
Standard Deviation
Standard deviation is the most commonlyused measure of dispersion
Similar to the mean deviation, the
standard deviation takes into account thevalue of every observation
It is the measure of the degree ofdispersion of the data from the meanvalue.
-
8/12/2019 0actual Dispersion
36/53
First, it says to subtract the mean fromeach of the scores
This difference is called a deviateor a
deviation score The deviate tells us how far a given score is
from the typical, or average, score
Thus, the deviate is a measure of dispersion
for a given score
-
8/12/2019 0actual Dispersion
37/53
It is a static that tells us how tightly allthe various values are clustered aroundthe mean in set of data.
Large S.D. indicates that data points arefar from the mean
Small S.D. indicates that all the datapoints cluster closely around the mean.
-
8/12/2019 0actual Dispersion
38/53
Standard Deviation It is the positive square root of thearithmetic mean of the squares of the
deviations of the observations from theirarithmetic mean.
Calculation:
Calculate the arithmetic mean (AM) Subtract each individual value from the AM Square each value -- multiply it times itself Sum (total) the squared values Divide the total by the number of values (N)
Calculate the square root of the value
Formula:
n
xx
2_
-
8/12/2019 0actual Dispersion
39/53
The Mean, Deviations, Variance, and SD
What is the effect of adding a constant amount to (orsubtracting from) each observed value?
What is the effect of multiplying each observed value (ordividing it by) a constant amount?
-
8/12/2019 0actual Dispersion
40/53
) Adding (subtracting) the same amount to(from) every observed value changes themean by the same amount but does not
change the dispersion (for either range ordeviation measures
Multiplying every observed value by thesame factor changes the mean and the SD
[or MD] by that same factor and changesthe variance by that factor squared.
-
8/12/2019 0actual Dispersion
41/53
usefulness
Manufacturers interested in producing items ofconsistent quality are very much concerned withS.D.
If the mean life of the component is 4 years and
the S.D. is very large, it would correspond tomany failures large before 4 years.
Quality control requires consistency andconsistency requires a relatively small S.D.
V i
-
8/12/2019 0actual Dispersion
42/53
The square of the standard deviation.More useful when we begin analysis ratherthan description:
1
)( 22
n
xxs
Variance
What Does the Variance Formula
-
8/12/2019 0actual Dispersion
43/53
What Does the Variance Formula
Mean?
Variance is the mean of the squareddeviation scores
The larger the variance is, the more the
scores deviate, on average, away from themean
The smaller the variance is, the less thescores deviate, on average, from the
mean
-
8/12/2019 0actual Dispersion
44/53
Combined Variance (For different means)
21
2
2
2
22
2
1
2
11 )()(
nn
dndn
-
8/12/2019 0actual Dispersion
45/53
Exercise 3
The mean and s.d of the lives of tyres ofmanufactured by two factories of Durable tyrecompany, making 50,000 tyres annually , at eachof the two factories , are given below. Calculate
combined mean and standard deviation of thelife of all the 100000 tyres produced in a year.
Factory Sample Size Mean (000 Kms) SD(000 Kms)
1 50 60 82 50 50 7
-
8/12/2019 0actual Dispersion
46/53
Combined Variance (For same means)
21
2
22
2
11nnnn
-
8/12/2019 0actual Dispersion
47/53
Example
The following data isrelated to clientsobtained by insuranceagents during a given
period for two types ofinsurance policies, achild policy and aretirement policy.
Calculate thecombined S.D.
Child
policy
Retirem
entpolicy
No. ofagents
25 18
Averageno. ofclientsbooked
72 64
Variance
of thedistribution
8 6
-
8/12/2019 0actual Dispersion
48/53
The Coefficient of Variation
It is the most important relative measures ofdispersion
One ratio measure of dispersion/inequality is the coefficientof variation, which is simply the standard deviation divided
by the mean. It answers the question: how big is the SD relative to
the mean?
100variationoftcoefficien
x
s
-
8/12/2019 0actual Dispersion
49/53
It is therefore a useful statistic to compare thedegree of variation from one data series toanother.
It helps us to determine how much volatility
(risk) we are assuming in comparison to theamount of return one can expect from aninvestment
Lower the coefficient of variation, better the risk-return tradeoff.
The distribution for which C.V. is more is said tobe less stable, less uniform, less consistent, lesshomogeneous.
-
8/12/2019 0actual Dispersion
50/53
Measure of Skew
Skewis a measure of symmetry in thedistribution of scores
Positive
Skew
Negative Skew
Normal(skew = 0)
-
8/12/2019 0actual Dispersion
51/53
Measure of Skewness
Measure of skewness of a distribution isgiven by
=3(mean median)
S.D.This measure is known as Karl Pearsons
coefficient of skewness and lies b/w -3and +3.
-
8/12/2019 0actual Dispersion
52/53
A distribution is said to be symmetric if mean =median = mode
A distribution is said to be positively skewed if
mean > median > mode
A distribution is said to be negatively skewed ifmean < median < mode
The smaller the number- the less the skewness.If co.skew=0 then the data is exactly balanced.
Bell -Shaped Curve showing the relationship between and . m
-
8/12/2019 0actual Dispersion
53/53
m m2 m1 m m 1 m 2 m 3
p g p m
68%
95%
99.7%