1 descriptive statistics chapter 3 msis 111 prof. nick dedeke
Post on 16-Jan-2016
219 Views
Preview:
TRANSCRIPT
1
Descriptive Statistics
Chapter 3MSIS 111 Prof. Nick Dedeke
2
Objectives
Define measures of central tendency, variability, shape and associationDefine statistical measuresCompute statistical measures for ungrouped and grouped dataInterpret statistical results
3
IntroductionIn most competitive sports, one looks for the position of the athletes, e.g. who came in first, second, and so on. In statistics, one is interested in the following measures:- most frequent value in data set- summary of all values in data set- midpoint position of data set- positions of data in data set- distances to midpoint of data set
4
Exercise: Statistical Measure 1
We want to find out which of the following students is the better one using the available data. The data shows the positions of the two competitors in several rounds of testing.
Kuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 1st 2nd 1st 1st 1st 3rd 2nd 3rd
5
Response: Commonsense Approach
We want to find out which of the following students is the better one using the available data. Kuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 2nd 2nd 1st 1st 1st 3rd 2nd 1st
3 times Kuli was 1st Marti was behind3 times Marti was 1st Kuli was behindMarti had more 2nd placesMarti had more 3rd placesImagine that you had a data set with 500 values!!
6
Mode
The most frequently occurring value in a data setApplicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
Bimodal -- Data sets that have two modesMultimodal -- Data sets that contain more than two modes
7
Median
Middle value in an ordered array of numbersApplicable for ordinal, interval, and ratio dataNot applicable for nominal dataUnaffected by extremely large and extremely small values
8
Median: Computational Procedure
First Procedure Arrange the observations in an ordered array. If there is an odd number of terms, the median is
the middle term of the ordered array. If there is an even number of terms, the median
is the average of the middle two terms.
Second Procedure The median’s position in an ordered array is
given by (n+1)/2.
9
Median: Odd Number Example (Long method)
Ordered Array3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
There are 17 terms in the ordered array.Position of median = (n+1)/2 = (17+1)/2 = 9The median is the 9th term, which is 15.If the 22 is replaced by 100, the median is 15.If the 3 is replaced by -103, the median is 15.
10
Median: Even Number Example (Long Method)
Ordered Array3 4 5 7 8 9 11 14 15 16 16 17 19 19 20
21
• There are 16 terms in the ordered array.• Position of median = (n+1)/2 = (16+1)/2 =
8.5• The median is between the 8th and 9th
terms, 14.5.
NOTE• If the 21 is replaced by 100, the median is
14.5.• If the 3 is replaced by -88, the median is
14.5.
11
Arithmetic Mean
Commonly called ‘the mean’Is the average of a group of numbersApplicable for interval and ratio dataNot applicable for nominal or ordinal dataAffected by each value in the data set, including extreme valuesComputed by summing all values in the data set and dividing the sum by the number of values in the data set
12
Population Mean (Long method)
1 2 3...
57 57 86 86 42 42 43 56 57 42 42 43
12653
1254.4167
NX
N NX X X X
Data for total population: 57, 57, 86, 86, 42, 42, 43, 56, 57, 42, 42, 43
13
Computing Sample Mean (Long method)
1 2 3...
57 86 42
3185
361.667
nX
Xn n
X X X X
Population mean is not the same thing as sample mean! Our numbers (57, 86, 42) is as sample thatis drawn from the population and hence it is asmall segment of it.
14
Computing Central Tend. Measures using Frequency Tables (Compact method)
Mean= Fi *Xi
Fi
= 1655/15
=110.33
XiFi Fi * Xi
55 2 110
60 1 60
100 3 300
125 5 625
140 4 560
15 1655
Mode= 125
Median position =
= (15+1)/2 = 8th
Median value = 125
THIS IS THE TYPE APPROACH YOU NEED TO MASTER FOR YOUREXAM.
Data for total population: 55, 55, 60, 100, 100, 100, 125, 125, 125, 125, 125, 140, 140, 140, 140
15
Exercise: Computing Central Tend. Measures using Frequency Tables
Mean= Fi *Xi
Fi
=
=
XiFi Fi * Xi
1 2
10 3
4 4
6 3
12 2
n=14
Mode=
Median position =
=
Median value =
16
Response: Computing Central Tend. Measures using Frequency Tables
Mean= Fi *Xi
Fi
= 82/14
=5.85
XiFi Fi * Xi
1 2 2
4 4 8
6 3 18
10 3 30
12 2 24
n=14 82
Mode= 6 and 4
Median position =
= (14+1)/2 = 7.5
(between 7th and 8th )
Median value =
= (6+6)/2 = 6
17
Opening Exercise: Using Statistical Measures
Kuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 2nd 2nd 1st 1st 1st 3rd 2nd 1st
Mode: Most frequently occurring value of variable
Mode for Kuli: 1st Mode for Marti: 1st Mean: Average of the values of a variable
Sample mean = Xi
n
Mean or average score for Kuli 25/11 = 2.27Mean or average score for Marti 21/11 = 1.9
18
Using Statistical MeasuresKuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 2nd 2nd 1st 1st 1st 3rd 2nd 1st
Median: The value in the middle of an ordered data set of n values.
Median point = (n + 1)/2 = (11+ 1)/2 = 6th position
Kuli 1st 1st 1st 1st 2nd 2nd 2nd 3rd 3rd 4th 5th Marti 1st 1st 1st 1st 2nd 2nd 2nd 2nd 3rd 3rd 3rd
Median score for Kuli is 2nd Median score for Marti is 2nd
Notice medianrequires ordered set
19
Using Frequency Distribution Tables
Analysis of Kuli’s performanceMean = Fi * Xi
Fi
= 25/11 = 2.27
Mode = 1st
Median point = (11+ 1)/2 = 6th Median value = 2nd Using cumul. Freq. column = 2nd
Xi Frequency (Fi)
Fi * Xi
Cum. (C Fi)
1st 4 4 4
2nd 3 6 7
3rd 2 6 9
4th 1 4 10
5th 1 5 11
11 25
20
Using Frequency Distribution Tables
Analysis of Marti’s performanceMean = Fi * Xi
Fi
= 21/11 = 1.9
Mode = 1st & 2nd Median point = (11+ 1)/2 = 6th
Median value = 2nd Using cumul. Freq. column = 2nd
XiFrequency
(Fi)Fi * Xi Cum
. (C Fi)
1st 4 4 4
2nd 4 8 8
3rd 3 9 11
4th 0 0 0
5th 0 0 0
11 21
21
Using Frequency Distribution Tables
Who is the better student?
Xi Marti Kuli
Mean 1.9 2.27
Median value 2nd 2nd
Mode 1st & 2nd 1st
22
New Case: Median measure
Analysis of Katie’s performanceMean = Fi * Xi
Fi
= 31/12 = 2.58Mode = 3rd Median point = (12+ 1)/2 = 6.5th
> median value is between 6th
and 7th positions
Median value=(2nd+3rd)/2 = 2.5th > Average of the 6th and 7th positions.
Xi Frequency (Fi)
Fi * Xi
Cum. (C Fi)
1st 4 4 4
2nd 2 8 6
3rd 5 15 11
4th 1 4 12
12 31
23
Examples
24
PercentilesSometimes we are not analyzing several values from one person, but one value for several persons or objects. For example we have data from the performance of several fund managers for year 2006. We want to present the data in the form, XX manager is in the top 10 or tenth percentile or top 25 or 25th percentile.The method used consists of three steps- organize data in ascending order- calculate location of percentile you want- identify the object in the percentile location from the data set
25
Interpretation: PercentilesIf manager YY is in the tenth percentile of of a group, this means that at least 10% of everyone scored below manager YY and at most 90 % of everyone in the data set scored better than manager YY. If manager Pico is in the 95th percentile of of a group, this means that at least 95 % of everyone in the data set scored below manager Pico and at most 5 % of everyone in the data set scored better than the manager .
26
Exercise: Percentiles for Known Values
First name
Fund performanc
e
Bill 106%
Jane 109%
Sven 114%
Larry 116%
Dub 121%
Anna 122%
Cole 125%
Salome
129%
In which percentile is Sven?
27
Deriving Percentiles with Cumulative Relative Frequency Approach for Observed ValuesFirst
nameFund
performance
Bill 106%
Jane 109%
Sven 114%
Larry 116%
Dub 121%
Anna 122%
Cole 125%
Salome
129%
In which percentile is Sven?
FiRel.
fi
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8N=8
Cumrel. fi
Percentiles
1/8=0.125
12.5th Percentile
2/8=0.25 25th Percentile
3/8=0.375
37.5th Percentile
4/8=0.50 50th Percentile
5/8=0.625
62.5th Percentile
6/8=0.75 75th Percentile
7/8=0.875
87.5th Percentile
8/8=1 100th Percentile
28
Deriving Percentiles with Cumulative Relative Frequency Approach for Unobserved Values
First name
Fund performanc
e
Bill 106%
Jane 109%
Sven 114%
Larry 116%
Dub 121%
Anna 122%
Cole 125%
Salome
129%What is the value of the 90th percentile?
FiRel.
fi
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
1 1/8
N=8
Cumfi
Percentiles
1/8 12.5th Percentile
2/8 25th Percentile
3/8 37.5th Percentile
4/8 50th Percentile
5/8 62.5th Percentile
6/8 75th Percentile
7/8 87.5th Percentile
1 100th Percentile
29
Computing Data Values When Given Percentile locations (Approximate
method) 90th percentile location i = (P/100) * N = 0.9 * 8 = 7.2th positionResult is not an integer, percentile position is (7.2 + 1) rounded up to 8th position. 90th percentile value from tables = 129%
This is an approximate method because the formula gives the same result for multiple percentiles:
The approximate method gives the same result of 129% for 91st, 92nd, 93rd , up to 100th percentiles
50th percentile location i = (P/100) * N = 0.5 * 8 = 4th position50th percentile = (4th value + 5th value)/2 = (116+121)/2 = 118.5% (But from tables we see that 116% is also the 50th percentile)
RECOMMENDATION: USE THIS APPROXIMATE APPROACH FORMULA WHEN YOU ARE DEALING WITH UNOBSERVED VALUES. IF YOU USE THE APPROACH IN THE EXAM, YOU WILL NOT BE MARKED WRONG.
30
Computing Percentile locations with arithmetic formula (More precise method)
90th percentile location i = (P/100) * N = 0.9 * 8 = 7.2th position90th percentile is 0.2 or 20% between the 7th and 8th The value for the 90th percentile is computed by computing the following values = 7th position’s value + (8th position’s value - 7th position value)* Fraction got from computing i125% + (129% - 125%)*0.2 = 125.8%(~ 126%)50th percentile location i = (P/100) * N = 0.5 * 8 = 4th position 50th percentile = 116%
31
Overview Measures and Summary of Conditions for Using Descriptive Measures
The use of statistical measures is conditioned on the level of measurement of data.For specific levels, e.g. nominal level, many statistical measures can not be used.
32
Descriptive Measures for Grouped Data
Mean, Median and Mode can all be computed for quantitative data sets, that were measured at the right level.
33
Class interval Frequency (Fi)
Midpoints (Mi)
[1 – 3) inches 16 2
[3 – 5) inches 2 4
[5 – 7) inches 4 6
[7 – 9) inches 3 8
[9 – 11) inches 9 10
[11 – 13) inches 6 12
40 40
Exercise: Central Tendency Measures for Grouped Data
Modal class:Median position:Median class:
34
Class interval Frequency (Fi)
Midpoint (Mi)
[1 – 3) inches 16 2
[3 – 5) inches 2 4
[5 – 7) inches 4 6
[7 – 9) inches 3 8
[9 – 11) inches 9 10
[11 – 13) inches 6 12
40 40
Response: Central Tendency Measures for Grouped Data
Modal class: [1 – 3) inches Median position: (n+1)/2 = 41/2 =20.5 between 20th and 21st positionsMedian class: [5-7) inches (this would be hard to derive if it were between 18th and 19th positions, i.e. it crossed two classes)
35
Class interval Frequency (Fi)
Midpoint (Mi)
(Fi)*(Mi)
[1 – 3) inches 16 2 32
[3 – 5) inches 2 4 8
[5 – 7) inches 4 6 24
[7 – 9) inches 3 8 24
[9 – 11) inches 9 10 90
[11 – 13) inches 6 12 72
40 40 226
Example: Central Tendency Measures for Grouped Data
Find the mean for the distribution:Mean: = (Σ Fi*Mi)/n = 226/40 = 5.65 inches
36
Class interval Frequency (Fi)
Midpoint (Mi)
(Fi)*(Mi)
[1 – 2) inches 2
[2 – 3) inches 2
[3 – 4) inches 4
[4 – 5) inches 2
[5 – 6) inches 1
Exercise: Central Tendency Measures for Grouped Data
Find the mean for the distribution:Mean: = (Σ Fi*Mi)/n = inches
37
Class interval Frequency (Fi)
Midpoint (Mi)
(Fi)*(Mi)
[1 – 2) inches 2 0.5 1
[2 – 3) inches 2 2.5 5
[3 – 4) inches 4 3.5 14
[4 – 5) inches 2 4.5 9
[5 – 6) inches 1 5.5 5.5
11 34.5
Response: Central Tendency Measures for Grouped Data
Find the mean for the distribution:Mean: = (Σ Fi*Mi)/n = 34.5/11 = 3.136
inches
38
Excel Examples
top related