numerical descriptive measures cont d · 2/16/2017 · 5 measures of positions: percentiles a...
TRANSCRIPT
1
Numerical Descriptive Measures Cont’d
Measures of Central Tendency (Location)
The Arithmetic Mean (Average)
The Median
The Mode
Measures of Dispersion or Variability
The Range
The variance
The Standard Deviation
The Coefficient of Variation
Measures of Relative Standing (Measures of Position)
Z Score
Quartiles and Percentiles
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Measures of Relative Standing (Measures of Position)
In some cases, the analysis of certain individual items in the data set is of
more interest rather than the entire set.
It is necessary at times, to be able to measure how an item fits into the
data, how it compares to other items of the data, or even how it
compares to another item in another data set.
Measures of position are several common ways of creating such
comparisons
Z Score (or Standard Score) The number of standard deviations that a given value x is above or below the
mean, the Z-score is given by:
Population σ
μxZ ,
Sample
xZ ,
s
X
2/12/2017 2
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU
Example: Comparing Z-Scores
Two students, who take different medical classes, had exams on the same day.
Khalid’s score was 83 while Hassan’s score was 78. Which student did relatively
better, given the class data shown below?
Hassan’s z-score is higher as He was positioned relatively higher within
his class than Khalid was within his class.
Khalid
Class
Hassan
Class
Class mean 78 70
Class standard deviation 4 5
25.14
873
8Z sKhalid' 61
5
7078.Z sHassan'
2/12/2017 3
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU
Measures of Positions: Percentiles
50% 50%
Median
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 4
25% 25% 25% 25%
Q1 Q2=Median Q3
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5=Q2 D6 D7 D8 D9
1% 1% … … 1% 1%
P1 P2 P50=Q2 P98 P99
5
Measures Of Positions: Percentiles
A percentile is the score at which a specified percentage of scores in a
distribution fall below.
The percentile rank of a score indicates the percentage of scores in
the distribution that fall at or below that score.
Percentile (Pr)
The rth percentile of a set of measurements is the value for which
At most r% of the measurements are less than that value.
At most (100-r)% of all the measurements are greater than that value.
Example
Suppose 600 is the 78% percentile (P78) of a GMAT score. Then
600200 800
78% of all the scores lie here 22% lie here
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
6
Commonly used percentiles
First (lower) Decile = 10th percentile
First (lower) Quartile, Q1 = 25th percentile
Second (middle) Quartile,Q2 = 50th percentile = Median
Third Quartile, Q3 = 75th percentile
Ninth (upper) Decile = 90th percentile
Location of Percentiles:
Find the location of any percentile using the formula
If the result is a whole number then it is the ranked position to use.
If the result is a fractional half (e.g. 7.5, 11.5, 26.5, etc.) then average the two corresponding data values.
If the result is not a whole number or a fractional half then interpolate between the data points.
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
percentile thr the of location theisr
Lwhere
100
r1)(n
rL
7
Find the first quartile and the median of the following set of measurements (n=15):
7, 8, 12, 17, 29, 18, 4, 27, 30, 2, 4, 10, 21, 5, 8
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th
2, 4, 4, 5, 7, 8, 8, 10, 12, 17, 18, 21, 27, 29, 30
P25,
Q1= 4th observation= 5
P50,
Q2=Median= 8th observation= 10
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Example
4.0100
251)(15L25
0.8100
50)115(L50
8
Find the median and third quartile of the following set of measurements (n=16)
7, 8, 12, 17, 29, 18, 4, 27, 30, 2, 4, 10, 21, 5, 8, 40
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th
2, 4, 4, 5, 7, 8, 8, 10, 12, 17, 18, 21, 27, 29, 30 40
P50,
P50 =Q2=Median= (8th observation+ 9th observation )/2
= (10+12)/2=11
P75,
P75 =Q3= 12th value+0.75(13th value –12th value)
= 21 +0.75(27-21)=25.5
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Example
5.8100
501)(16L50
7512100
7511675 .)( L
9
Calculate the 30th, 67th, 90th percentile of the following set of measurements (n=15):
7, 8, 12, 17, 29, 18, 4, 27, 30, 2, 4, 10, 21, 5, 8
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th
2, 4, 4, 5, 7, 8, 8, 10, 12, 17, 18, 21, 27, 29, 30
P30,
P30 = 4th observation+ 0.8(5th observation - 4th observation) = 6.6
P67,
P67 =10th observation+ 0.72(11th observation - 10th observation)= 17.72
P90 =29.4STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU2/12/2017
Example
4.8100
301)(15L30
7210100
6711567 .)( L
10
IQR= P75 – P25
= Q3 – Q1
This helps to get a range that is not influenced by the extreme high and low scores.
Where the range is the spread across 100% of the scores, the IQR is the spread across the middle 50%.
Large value indicates a large spread of the observations.
The IQR is a measure of variability that is not influenced by
outliers or extreme values
Measures like Q1, Q3, and IQR that are not influenced by outliers
are called resistant measures
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU
Interquartile Range, IQR
Box plots
A box plot is a graphical display that provides the main descriptive
measures of the measurement set.
To construct a box plot, we need only five statistics:
1. The minimum value,
2. Q1(The first quartile),
3. The median,
4. Q3 (The third quartile), and
5. The maximum value.
Outliers: An observation x is called an outlier if:
x < Q1 - 1.5 (IQR) or x > Q3 + 1.5 (IQR)
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 11
The Box Plot
Median
(Q2) maximumminimumQ1 Q3
Example: Draw a for the data:
11 12 13 16 16 17 18 21 22
25% 25% 25% 25%
11 12.5 16 19.5 22
Interquartile range
= 19.5 – 12.5 = 7
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU
Distribution Shape and The Box plot
Positively-SkewedNegatively-Skewed Symmetrical
Q1 Q2 Q3 Q1 Q2 Q3Q1 Q2 Q3
2/12/2017
2/12/2017
STA 231: Biostatistics
Dr. Ahmed Jaradat, AGU 14
S =41,
L =100,
Q1=66.5 ,
Q2=76,
Q3=89,
IQR=89-66.5
=22.5
Interpreting the box plot results
The scores range from 41 to 100.
About half the scores are smaller than 76, and about half are larger than 76.
About half the scores lie between 66.5 and 89.
About a quarter lies below 66.5 and a quarter above 89.
An outlier is any grade x < 66.5-1.5 (22.5)=32.75 or x > 89.0+1.5 (22.5)=122.75
Example: Consider the Biostatistics grades