investigation data colllection data presentation tabulation diagrams graphs descriptive statistics...
TRANSCRIPT
INVESTIGATIONINVESTIGATION
Data Colllection
Data Presentation
TabulationDiagramsGraphs
Descriptive Statistics
Measures of LocationMeasures of Dispersion
Measures of Skewness & Kurtosis
Inferential Statistiscs
Estimation Hypothesis TestingPonit estimateInteval estimate
Univariate analysis
Multivariate analysis
EXAMPLE:
(1) 7,8,9,10,11 n=5, x=45, =45/5=9
(2) 3,4,9,12,15 n=5, x=45, =45/5=9
(3) 1,5,9,13,17 n=5, x=45, =45/5=9
S.D. : (1) 1.58 (2) 4.74 (3) 6.32
x
x
x
3
Measures of Dispersion
Or
Measures of variability
measures of dispersion summarize differences in the data, how the numbers differ from one another.
Measures of DispersionMeasures of Dispersion
5
Series I: 70 70 70 70 70 70 70 70 70 70
Series II: 66 67 68 69 70 70 71 72 73 74
Series III: 1 19 50 60 70 80 90 100 110 120
6
Measures of Variability
• A single summary figure that describes the spread of observations within a distribution.
7
MEASURES OF DESPERSION
• RANGE
• INTERQUARTILE RANGE
• VARIANCE
• STANDARD DEVIATION
8
Measures of Variability
• Range– Difference between the smallest and largest observations.
• Interquartile Range– Range of the middle half of scores.
• Variance– Mean of all squared deviations from the mean.
• Standard Deviation– Rough measure of the average amount by which observations deviate from
the mean. The square root of the variance.
9
Range
• The difference between the lowest and highest values in the data set.
• The range can be misleading with outliers
data: 2,4,5,2,5,6,1,6,8,25,2
Sorted data: 1,2,2,2,3,4,5,6,6,8,25
24125 MinimumMaximumRange
10
Variability Example: Range
• Hotel Rates
52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891
• Range: 891-52 = 839
11
Measures of Position
Quartiles, Deciles,
Percentiles
12
Q1, Q2, Q3 divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1(minimum) (maximum)
(median)
13
Quartiles: 1 = Q n+1
th4
2 = Q 2(n+1) n+1 = th
4 2
3 = Q 3(n+1) th
4Inter quartile : IQR = Q3 – Q1
14
Inter quartile Range
• The inter quartile range is Q3-Q1
• 50% of the observations in the distribution are in the inter quartile range.
• The following figure shows the interaction between the quartiles, the median and the inter quartile range.
15
Inter quartile Range
16
Sample Number Unsorted Values1 252 27 3 20 4 235 266 247 198 169 2510 1811 3012 2913 3214 2615 2416 2117 2818 2719 2020 1621 14
17
Sample Number Unsorted Values Ranked Values1 25 142 27 163 20 164 23 185 26 196 24 20 7 19 208 16 219 25 2310 18 2411 30 2412 29 2513 32 2514 26 2615 24 2616 21 2717 28 2718 27 2819 20 2920 16 3021 14 32
18
Sample Number Unsorted Values Ranked Values1 25 14 Minimum2 27 163 20 164 23 185 26 196 24 20 LQ or Q1
7 19 208 16 219 25 2310 18 2411 30 24 Md or Q2
12 29 2513 32 2514 26 2615 24 2616 21 27 UQ or Q3
17 28 2718 27 2819 20 2920 16 3021 14 32 Maximum
19
D1, D2, D3, D4, D5, D6, D7, D8, D9
divides ranked data into ten equal parts
Deciles
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
20
Quartiles
Q1 = P25
Q2 = P50
Q3 = P75
D1 = P10
D2 = P20
D3 = P30
• • •
D9 = P90
Deciles
21
Quartiles, Deciles, Percentiles
Fractiles
(Quantiles)partitions data into approximately equal parts
22
Percentiles and Quartiles
• Maximum is 100th percentile: 100% of values lie at or below the maximum
• Median is 50th percentile: 50% of values lie at or below the median
• Any percentile can be calculated. But the most common are 25th (1st Quartile) and 75th (3rd Quartile)
23
Locating Percentiles in a Frequency Distribution
• A percentile is a score below which a specific percentage of the distribution falls(the median is the 50th percentile.
• The 75th percentile is a score below which 75% of the cases fall.
• The median is the 50th percentile: 50% of the cases fall below it
• Another type of percentile :The quartile lower quartile is 25th percentile and the upper quartile is the 75th percentile
24
NUMBER OF CHILDREN
260 26.6 26.6 26.6
161 16.4 16.5 43.1
260 26.6 26.6 69.7
155 15.8 15.9 85.6
70 7.2 7.2 92.7
31 3.2 3.2 95.9
21 2.1 2.1 98.1
11 1.1 1.1 99.2
8 .8 .8 100.0
977 99.8 100.0
2 .2
979 100.0
0
1
2
3
4
5
6
7
EIGHT OR MORE
Total
Valid
NAMissing
Total
Frequency PercentValid
PercentCumulative
Percent
50th percentile
80th percentile
50% included here
80% included
here
25th percentile
25% included here
25
26
27
Five Number Summary
• Minimum Value
• 1st Quartile
• Median
• 3rd Quartile
• Maximum Value
28
VARIANCE:
Deviations of each observation from the mean, then averaging the sum of squares of these deviations.
STANDARD DEVIATION:
“ ROOT- MEANS-SQUARE-DEVIATIONS”
29
Variance
• The average amount that a score deviates from the typical score.
– Score – Mean = Difference Score
– Average of Difference Scores = 0
– In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).
30
Variance: Computational Formula
• Population • Sample
2
222
)(
N
XXN
2
222
)(
n
XXnS
31
Standard Deviation
• To “undo” the squaring of difference scores, take the square root of the variance.
• Return to original units rather than squared units.
32
Quantifying Uncertainty
• Standard deviation: measures the variation of a variable in the sample.
– Technically,
s x xN ii
N
1
12
1
( )
33
Standard Deviation
• Population • Sample
Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.
2 s s2
(X )N
2 2)(
n
XXS
N X2 ( X )
N2
22
2
2 )(
n
XXnS
Example:
-1 1
3 9
-2 4
-3 9
2 4
1 1
Data: X = {6, 10, 5, 4, 9, 8}; N = 6
Total: 42 Total: 28
Standard Deviation:
76
42
N
XX
Mean:
Variance:2
2 ( ) 284.67
6
X Xs
N
16.267.42 ss
XX 2)( XX X
6
10
5
4
9
8
35
Calculating a Mean and a Standard Deviation
Absolute SquaredData Deviation Deviation Deviation
x x - Mean |x - Mean| (x-Mean)²10 -20 20 40020 -10 10 10030 0 0 040 10 10 10050 20 20 400
Sums 150 0 60 1000Means 30 0 12 200
Variance
14.1421356Standard deviation = Variance
36
Example of SD with discrete data
• Marks achieved by 7 students: 3, 4, 6, 2, 8, 8, 5
• Mean of these marks = 36/7 = 5.14
• Deviations from mean…
x x-x
3 3 - 5.14= -2.14
4 4 - 5.14= -1.14
6 6 - 5.14= 0.86
2 2 - 5.14= -3.14
8 8 - 5.14= 2.86
8 2.86
5 5 - 5.14= -0.14
Total = 0Problem! The sum of the deviations is always going to be 0!
Solution! Square them to get rid of the negatives…
(x – x)2
37
Example of SD with discrete data
• Marks achieved by 7 students: 3, 4, 6, 2, 8, 8, 5
• Mean of these marks = 36/7 = 5.14
• Deviations from mean…
x x-x
3 3 - 5.14= -2.14 4.59
4 4 - 5.14= -1.14 1.31
6 6 - 5.14= 0.86 0.73
2 2 - 5.14= -3.14 9.88
8 8 - 5.14= 2.86 8.16
8 2.86 8.16
5 5 - 5.14= -0.14 0.02
Total = 0
(x – x)2
Total = 32.85
Variance = 32.85 / 7 = 4.69
SD = √4.69 = 2.17
38
Variability Example: Standard Deviation
Mean: 6
Standard Deviation: 20.2
0.4
100
36004000
10
)60()400(102
2
S
S
S
S
0.210
40
10
)69()68()68()67()67()66()64()64()64()63(
)(
2222222222
2
S
S
n
XXS
2
2
2 )(
n
XXnS
X X2
3 94 164 164 166 367 497 498 648 649 81
Sum: 60 Sum: 400
39
40
41
42
Mean and Standard Deviation
• Using the mean and standard deviation together:– Is an efficient way to describe a distribution with just two numbers.
– Allows a direct comparison between distributions that are on different scales.
43
WHICH MEASURE TO USE ?
DISTRIBUTION OF DATA IS SYMMETRIC
---- USE MEAN & S.D.,
DISTRIBUTION OF DATA IS SKEWED
---- USE MEDIAN & QUARTILES
44
Mean, Median and Mode
45
Distributions
• Bell-Shaped (also known as symmetric” or “normal”)
• Skewed:– positively (skewed to the right) –
it tails off toward larger values
– negatively (skewed to the left) – it tails off toward smaller values
46
15 30 45 60 75
95% Confidence Interval for Mu
28 33 38 43
95% Confidence Interval for Median
Variable: Age
A-Squared:P-Value:
MeanStDevVarianceSkewnessKurtosisN
Minimum1st QuartileMedian3rd QuartileMaximum
32.3851
13.3380
28.0000
0.9620.014
36.450015.7356247.608
0.6796268.51E-02
60
11.000025.000031.500046.750079.0000
40.5149
19.1921
42.0000
Anderson-Darling Normality Test
95% Confidence Interval for Mu
95% Confidence Interval for Sigma
95% Confidence Interval for Median
Descriptive Statistics
47
ANY QUESTIONS