describing data: 2. numerical summaries of data using measures of central tendency and dispersion
Post on 20-Dec-2015
224 views
TRANSCRIPT
![Page 1: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/1.jpg)
DESCRIBING DATA: 2
![Page 2: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/2.jpg)
Numerical summaries of data
using measures of central tendency
and dispersion
![Page 3: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/3.jpg)
Central tendency--Mode
Major FAnthropology 97Economics 104Geography 57Political Science 110Sociology 82
Table 1. Undergraduate Majors
![Page 4: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/4.jpg)
Bimodal Distributions
Major FAnthropology 97Economics 110Geography 57Political Science 110Sociology 82
Table 1. Undergraduate Majors
![Page 5: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/5.jpg)
Mode for Grouped Frequency Distributions based on Interval DataMean dailytemp.
Place A(f)
Place B(f)
10-19.9 degrees 5 020-29.9 5 530-39.9 20 1040-49.9 30 1550-59.9 20 3060-69.9 20 40
Midpoint of the modal class interval
![Page 6: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/6.jpg)
Median
• The point in the distribution above which and below which exactly half the observations lie (50th percentile)
• Calculation depends on whether the no. of observations is odd or even.
![Page 7: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/7.jpg)
Distribution 1(n=5)
Distribution 2(n=6)
198 197179 193172 189167 187154 183
179
Median=
188
![Page 8: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/8.jpg)
MEDIAN for grouped frequency distributions based on interval data
Mean dailytemp. (f)
Cumulative(f)
10-19.9 degrees 5 520-29.9 5 1030-39.9 20 3040-49.9 30 6050-59.9 20 8060-69.9 20 100
Median = 40 + ((20/30) * 10) = 40 + 6.67 = 46.67
![Page 9: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/9.jpg)
ARITHMETIC MEAN
nyY i /)(
47/28
7/)7763311(
y
![Page 10: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/10.jpg)
Mean for Grouped Data
Mean dailytemp. (f)
Midpoint ofinterval
F timesmidpoint
10-19.9degrees
5 15 75
20-29.9 5 25 12530-39.9 20 35 70040-49.9 30 45 135050-59.9 20 55 110060-69.9 20 65 1300Totals 100 4650
Mean = sum of weighted midpoints / n = 4650/100=46.5
![Page 11: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/11.jpg)
Mean is the balancing point of the distribution
0 1 2 3 4 5 6 7 8 9
XX
X
X X
X
X
MEAN
![Page 12: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/12.jpg)
Key Properties of the Mean
• Sum of the differences between the individual scores and the mean equals 0
• sum of the squared differences between the individual scores and the mean equals a minimum value.
0)( YY
2)( YY The minimum value
![Page 13: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/13.jpg)
Weaknesses of each measure of central tendency
• MODE: ignores all other info. about values except the most frequent one
• MEDIAN: ignores the LOCATION of scores above or below the midpoint
• MEAN: is the most sensitive to extreme values
![Page 14: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/14.jpg)
Mode MeanMedian
Impacts of skewed distributions
![Page 15: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/15.jpg)
Measures of Dispersion
Suburb A Suburb B24 2823 2522 2221 1920 16
Mean=22 Mean=22
Poverty Households (%) in 2 suburbs by tract
Less dispersion
more dispersion
![Page 16: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/16.jpg)
Range
• Highest value minus the lowest value
• problem: ignores all the other values between the two extreme values
![Page 17: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/17.jpg)
Interquartile range• Based on the quartiles (25th percentile and 75th
percentile of a distribution)
• Interquartile range = Q3-Q1
• Semi-interquartile range = (Q3-Q1)/2
• eliminates the effect of extreme scores by
excluding them
![Page 18: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/18.jpg)
Graphic representation: Box Plot
374452N =
latin americaasiaafrica
Infa
nt m
ort
ality
ra
te
200
100
0
-100
101
132
Infant mortality
rate
Africa Asia Latin America
![Page 19: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/19.jpg)
Variance
• A measure of dispersion based on the second property of the mean we discussed earlier:
2)( YY minimum
![Page 20: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/20.jpg)
Step 1: Calculate the total sum of squares around the
meanY )( YY 2)( YY 10 -5 2512 -3 914 -1 115 0 016 +1 118 +3 920 +5 25
Mean=105/7=15 Sum = 70
![Page 21: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/21.jpg)
Step 2: Take an average of this total variation
1/)( 22 nYYsWhy n-1? Rather than simply n???
The normal procedure involves estimating variance for a population using data from a sample.
Samples, especially small samples, are less likely to include extreme scores in the population.
N-1 is used to compensate for this underestimate.
![Page 22: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/22.jpg)
Step 3: Take the square root of variance
1/)( 2 nYYs
Purpose: expresses dispersion in the original units of measurement--not units of measurement squared
Like variance: the larger the value the greater the variability
![Page 23: DESCRIBING DATA: 2. Numerical summaries of data using measures of central tendency and dispersion](https://reader036.vdocument.in/reader036/viewer/2022062407/56649d445503460f94a20cb4/html5/thumbnails/23.jpg)
Coefficient of Variation (V)
V = (standard deviation / mean)
Value: To allow you to make comparisons of dispersion across groups with very different mean values or across variables with very different measurement scales.