0actual dispersion

Upload: sushantgaur

Post on 03-Jun-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 0actual Dispersion

    1/53

    Measures of

    Dispersion

  • 8/12/2019 0actual Dispersion

    2/53

    Defination

    While measures of central tendencyindicate what value of avariable is (in one sense or other) average or central or

    typical in a set of data, measures of dispersion(or variabilityorspread) indicate (in one sense or other) the extent to which theobserved values are spread out around that center how far

    apart observed values typically are from each other or from someaverage value (in particular, the mean). Thus: if all cases have identical observed values (and thereby are

    also identical to [any] average value), dispersion is zero; if most cases have observed values that are quite close

    together (and thereby are also quite close to the averagevalue), dispersion is low (but greater than zero); and

    if many cases have observed values that are quite far awayfrom many others (or from the average value), dispersion ishigh.

  • 8/12/2019 0actual Dispersion

    3/53

    Measures of Dispersion

    Synonym for variability

    Often called spread or scatter

    Indicator of consistency among adata set

    Indicates how close data areclustered about a measure ofcentral tendency

  • 8/12/2019 0actual Dispersion

    4/53

    Example

    Consider the following data related to agedistribution of two groups A and B:

    avg

    Grp A 22 24 25 26 28 25

    Grp B 8 15 20 28 54 25

  • 8/12/2019 0actual Dispersion

    5/53

    Above mentioned two groups have thesame average i.e. 25 years, so we arelikely to conclude that the two groups are

    similar. Wrong conclusion as the obs. in group A

    are close to one another indicating thatpeople in this group are more or less of

    the age 22 to 28 years.

  • 8/12/2019 0actual Dispersion

    6/53

    While those in group B are widely dissimilar andhave greater variability of ages as it includes aperson who is 8 years old on one hand and aperson of age 54 on the other hand.

    This means that central value does not give theclear indication of the pattern of distribution.

    Measure of dispersion or variability gives us theinformation about the spread of the obs. In one

    distribution

    Here, dispersion of group B is more than that ofgroup A

  • 8/12/2019 0actual Dispersion

    7/53

    Purpose of Measuring Variation

    To test the reliability of an average

    To serve as a basis for control ofvariability

    To compare two or more series withregard to variability

    To facilitate as a basis for furtherstatistical analysis.

  • 8/12/2019 0actual Dispersion

    8/53

    Properties of a good measure of

    variation

    It should be simple to understand and easy to calculate.

    It should be based on all observations.

    It should be amenable to further algebraic treatment.

    It should not be affected by extreme observations.

  • 8/12/2019 0actual Dispersion

    9/53

    Measures of variation

    Absolute measures

    Range

    Quartile deviation

    Mean deviation

    Standard deviation / variance

    Lorenz curve

    Relative measures

    Coefficient of range

    Coefficient of variation

    Coefficient of quartile deviation

    Coefficient of mean deviation

  • 8/12/2019 0actual Dispersion

    10/53

    Absolute measures of variation

    They are expressed in the same statistical unit inwhich the original data are given such as rupees,kg etc.

    These values are used to compare the variationin two or more than two distributions providedthe variables are expressed in the same units andhave almost the same average value.

  • 8/12/2019 0actual Dispersion

    11/53

    Relative measures of variation

    Absolute measure of dispersion expressesvariation in the same units as the originaldata

    To compare the variations of two differentseries, relative measure of standarddeviation is calculated.

  • 8/12/2019 0actual Dispersion

    12/53

    Range

    Range is the preliminary indicator of dispersion.

    The (total or simple) range is the maximum(highest) value observed in the data [the value ofthe case at the 100th percentile] minus the

    minimum(lowest) value observed in the data[the value of the case at the 0th percentile] That is, it is the distance or interval between the

    values of these two most extreme cases.

    Indicates how spread out the data are.

    Open-ended distributions have no range bz nohighest or lowest values exist in an open-endedclass.

  • 8/12/2019 0actual Dispersion

    13/53

    The Range

    The rangeis defined as the differencebetween the largest score in the set ofdata and the smallest score in the set of

    data, XL- XS What is the range of the following data:

    4 8 1 6 6 2 9 3 6 9

    The largest score (XL) is 9; the smallest

    score (XS) is 1; the range is XL- XS= 9 -1 = 8

  • 8/12/2019 0actual Dispersion

    14/53

    Coefficient of scatter

    Ratio of range

    Coefficient of range =(Max- Min )/ (Max +Min) = Absolute range / Sum of the

    extreme values

  • 8/12/2019 0actual Dispersion

    15/53

    Dispersion Example

    Number of minutes 20clients waited to see aconsultant

    ConsultantX Y

    05 15 11 12

    12 03 10 13

    04 19 11 1037 11 09 13

    06 34 09 11

    Consultant X:

    Sees some clientsalmost immediately

    Others wait over 1/2hour

    Highly inconsistent

    Consultant Y:

    Clients wait about 10

    minutes 9 minutes least wait and

    13 minutes most

    Highly consistent

  • 8/12/2019 0actual Dispersion

    16/53

    Solution

    1.Coefficient of range

    =(Max- Min )/ (Max + Min)

    = (37- 03 )/ (37 + 03) = 34/40 = 0.85

    2. Coefficient of range

    =(Max- Min )/ (Max + Min)= (13- 09 )/ (13 + 09) = 4/22 = 0.18

    Consultant X is inconsistent and Consultant Y is consistent intheir job.

  • 8/12/2019 0actual Dispersion

    17/53

    Uses

    QUALITY CONTROL: The objective of quality control is to keep a check on the

    quality of the product without 100% inspection

    When statistical methods of quality control are used,control charts are prepared in which range plays animportant role.

    The basic idea is that as long as manufactured productsconform to set standards (range), the productionprocess is assumed to be in control.

    WHEATHER FORECASTS: This helps the general public to know as to what limits

    the temperature is likely to vary on a particular day.

  • 8/12/2019 0actual Dispersion

    18/53

    Quartile deviation

    It measures the distance between thelowest and highest of the middle 50percent of the scores of distribution.

    Q.D. is superior to range, as it is notbased on two extreme values but ratheron middle 50% observation.

    It can be calculated from open-ended

    classes. It is often used with skewed data as it is

    insensitive to the extreme scores

  • 8/12/2019 0actual Dispersion

    19/53

    Interquartile Range

    Interquartile range = Q3Q1

    Semi-interquartile range or quartile

    deviationis defined as

    = (Q3Q1)/2

    Coefficient of quartile deviationis

    = = (Q3Q1)/(Q3+ Q1)

  • 8/12/2019 0actual Dispersion

    20/53

    When Q.D. is small then it describes highuniformity of central 50% observations.

    High Q.D. means high variation among the

    central observations.

  • 8/12/2019 0actual Dispersion

    21/53

    Interquartile Range Example

    The number of complaints received by themanager of a supermarket was recorded foreach of the last 10 working days.

    21, 15, 18, 5, 10, 17, 21, 19, 25 & 28

    Sorted data

    5, 10, 15, 17, 18, 19, 21, 21, 25 & 28

    nObservatioorQ

    Q

    nQ

    rd375.2

    4

    11

    4

    1

    1

    1

    1

    nObservatioorQ

    Q

    nQ

    th825.8

    4

    33

    4

    13

    3

    3

    3

    Interquartile range = 21

    15 = 6 days

  • 8/12/2019 0actual Dispersion

    22/53

    Calculating exactly:Q1

    Using the formula:

    16

    X f CF

    0 < 20 15 15

    20 < 40 60 75

    40

  • 8/12/2019 0actual Dispersion

    23/53

    Q3

    17

    Third QuartileThis is in the group 20 < 40

    Lower limit (l) is 20

    Width of group (i) is 20

    Frequency of group (f) is 60CF of previous group (F) is 15

    X f CF

    0 < 20 15 15

    20 < 40 60 75

    40

  • 8/12/2019 0actual Dispersion

    24/53

    Interquart e Range an Coe c ent o Q.D.

    Interquartile range = 40-23.333= 16.671

    Semi-interquartile range or quartile

    deviationis defined as= (Q3Q1)/2 = 16.67/2 =8.335

    Coefficient of quartile deviationis= = (Q3Q1)/(Q3+ Q1) = 16.67/ 63.33

    = 0.26

  • 8/12/2019 0actual Dispersion

    25/53

    ExampleWeekly income (Rs.) no. of workers

    below 1350 8

    1350-1370 16

    1370-1390 39

    1390-1410 58

    1410-1430 60

    1430-1450 40

    1450-1470 22

    1470-1490 15

    1490-1510 15

    1510-1530 9

    1530 and above 10

    Use an appropriatemeasure toevaluate thevariation in thefollowing data:

  • 8/12/2019 0actual Dispersion

    26/53

    Problems with quartile Deviation

    It is not based on all the observations

    Affected by sampling fluctuations

    Not suitable for further algebraic treatment

  • 8/12/2019 0actual Dispersion

    27/53

    Deviation Measures of Dispersion (cont.) The deviation from the mean for a representative case iis

    xi- mean ofx.

    If almost all of these deviations are small, dispersion is small. If many of these deviations are large, dispersion is large.

    This suggests we could construct a measure Dof dispersionthat would simply be the average (mean) of all thedeviations.

    But this will not work because, as we saw earlier, it is a

    property of the mean that all deviation from it add up to

    zero.

  • 8/12/2019 0actual Dispersion

    28/53

    DeviationMeasuresof Dispersion: Example(cont.)

  • 8/12/2019 0actual Dispersion

    29/53

    The Mean Deviation A practical way around this problem is simply to ignore the

    fact that some deviations are negative while others are

    positive by averaging the absolute valuesof the deviations. This measure (called the mean deviation) tells us the

    average(mean) amount that the values for all casesdeviate(regardless of whether they are higher or lower)from the average(mean) value.

    Indeed, the Mean Deviation is an intuitive, understand-able, and perfectly reasonable measure of dispersion, and itis occasionally used in research.

    The mean deviation takes into consideration all of thevalues.

  • 8/12/2019 0actual Dispersion

    30/53

    The Mean Deviation (cont.)

  • 8/12/2019 0actual Dispersion

    31/53

    If the data are in the form of a frequencydistribution, the mean deviation can be calculatedusing the following formula:

    Where: f= the frequency of an observation x

    n = f= the sum of the frequencies

    This measure is an improvement over theprevious two measures in the sense that itconsiders all observations of a data set.

    Frequency Distribution Mean Deviation

    f

    xxfMD

    _

    ||

  • 8/12/2019 0actual Dispersion

    32/53

    Coefficient of mean deviation

    Coefficient of mean deviation =

    = Mean deviation

    Mean

    E l

  • 8/12/2019 0actual Dispersion

    33/53

    Example

    Find out the mean deviation for the following distribution of

    demand for a bookQuantity

    demanded(in unit)

    Frequency fx |x-x| f|x-x|

    6 4 24 17.6 70.4

    12 7 84 11.6 81.2

    18 10 180 5.6 56

    24 18 432 0.4 7.2

    30 12 360 6.4 76.8

    36 7 254 12.4 86.8

    42 2 84 18.4 36.8

    total 60 fx = 1416 f|x-x| =415.2

    mean =

    1416/60=23.6

    MD=

    415.2/60=6.92

    f

    xxfMD

    _

    ||

    f

    fxx_

    x

  • 8/12/2019 0actual Dispersion

    34/53

    Problems with Mean Deviation

    Algebraic signs are ignored while takingthe deviations of the items.

    Cannot be computed for distribution

    with open end classes.

    Not suitable for further mathematicaltreatment.

  • 8/12/2019 0actual Dispersion

    35/53

    Standard Deviation

    Standard deviation is the most commonlyused measure of dispersion

    Similar to the mean deviation, the

    standard deviation takes into account thevalue of every observation

    It is the measure of the degree ofdispersion of the data from the meanvalue.

  • 8/12/2019 0actual Dispersion

    36/53

    First, it says to subtract the mean fromeach of the scores

    This difference is called a deviateor a

    deviation score The deviate tells us how far a given score is

    from the typical, or average, score

    Thus, the deviate is a measure of dispersion

    for a given score

  • 8/12/2019 0actual Dispersion

    37/53

    It is a static that tells us how tightly allthe various values are clustered aroundthe mean in set of data.

    Large S.D. indicates that data points arefar from the mean

    Small S.D. indicates that all the datapoints cluster closely around the mean.

  • 8/12/2019 0actual Dispersion

    38/53

    Standard Deviation It is the positive square root of thearithmetic mean of the squares of the

    deviations of the observations from theirarithmetic mean.

    Calculation:

    Calculate the arithmetic mean (AM) Subtract each individual value from the AM Square each value -- multiply it times itself Sum (total) the squared values Divide the total by the number of values (N)

    Calculate the square root of the value

    Formula:

    n

    xx

    2_

  • 8/12/2019 0actual Dispersion

    39/53

    The Mean, Deviations, Variance, and SD

    What is the effect of adding a constant amount to (orsubtracting from) each observed value?

    What is the effect of multiplying each observed value (ordividing it by) a constant amount?

  • 8/12/2019 0actual Dispersion

    40/53

    ) Adding (subtracting) the same amount to(from) every observed value changes themean by the same amount but does not

    change the dispersion (for either range ordeviation measures

    Multiplying every observed value by thesame factor changes the mean and the SD

    [or MD] by that same factor and changesthe variance by that factor squared.

  • 8/12/2019 0actual Dispersion

    41/53

    usefulness

    Manufacturers interested in producing items ofconsistent quality are very much concerned withS.D.

    If the mean life of the component is 4 years and

    the S.D. is very large, it would correspond tomany failures large before 4 years.

    Quality control requires consistency andconsistency requires a relatively small S.D.

    V i

  • 8/12/2019 0actual Dispersion

    42/53

    The square of the standard deviation.More useful when we begin analysis ratherthan description:

    1

    )( 22

    n

    xxs

    Variance

    What Does the Variance Formula

  • 8/12/2019 0actual Dispersion

    43/53

    What Does the Variance Formula

    Mean?

    Variance is the mean of the squareddeviation scores

    The larger the variance is, the more the

    scores deviate, on average, away from themean

    The smaller the variance is, the less thescores deviate, on average, from the

    mean

  • 8/12/2019 0actual Dispersion

    44/53

    Combined Variance (For different means)

    21

    2

    2

    2

    22

    2

    1

    2

    11 )()(

    nn

    dndn

  • 8/12/2019 0actual Dispersion

    45/53

    Exercise 3

    The mean and s.d of the lives of tyres ofmanufactured by two factories of Durable tyrecompany, making 50,000 tyres annually , at eachof the two factories , are given below. Calculate

    combined mean and standard deviation of thelife of all the 100000 tyres produced in a year.

    Factory Sample Size Mean (000 Kms) SD(000 Kms)

    1 50 60 82 50 50 7

  • 8/12/2019 0actual Dispersion

    46/53

    Combined Variance (For same means)

    21

    2

    22

    2

    11nnnn

  • 8/12/2019 0actual Dispersion

    47/53

    Example

    The following data isrelated to clientsobtained by insuranceagents during a given

    period for two types ofinsurance policies, achild policy and aretirement policy.

    Calculate thecombined S.D.

    Child

    policy

    Retirem

    entpolicy

    No. ofagents

    25 18

    Averageno. ofclientsbooked

    72 64

    Variance

    of thedistribution

    8 6

  • 8/12/2019 0actual Dispersion

    48/53

    The Coefficient of Variation

    It is the most important relative measures ofdispersion

    One ratio measure of dispersion/inequality is the coefficientof variation, which is simply the standard deviation divided

    by the mean. It answers the question: how big is the SD relative to

    the mean?

    100variationoftcoefficien

    x

    s

  • 8/12/2019 0actual Dispersion

    49/53

    It is therefore a useful statistic to compare thedegree of variation from one data series toanother.

    It helps us to determine how much volatility

    (risk) we are assuming in comparison to theamount of return one can expect from aninvestment

    Lower the coefficient of variation, better the risk-return tradeoff.

    The distribution for which C.V. is more is said tobe less stable, less uniform, less consistent, lesshomogeneous.

  • 8/12/2019 0actual Dispersion

    50/53

    Measure of Skew

    Skewis a measure of symmetry in thedistribution of scores

    Positive

    Skew

    Negative Skew

    Normal(skew = 0)

  • 8/12/2019 0actual Dispersion

    51/53

    Measure of Skewness

    Measure of skewness of a distribution isgiven by

    =3(mean median)

    S.D.This measure is known as Karl Pearsons

    coefficient of skewness and lies b/w -3and +3.

  • 8/12/2019 0actual Dispersion

    52/53

    A distribution is said to be symmetric if mean =median = mode

    A distribution is said to be positively skewed if

    mean > median > mode

    A distribution is said to be negatively skewed ifmean < median < mode

    The smaller the number- the less the skewness.If co.skew=0 then the data is exactly balanced.

    Bell -Shaped Curve showing the relationship between and . m

  • 8/12/2019 0actual Dispersion

    53/53

    m m2 m1 m m 1 m 2 m 3

    p g p m

    68%

    95%

    99.7%