basic statistics measures of central tendency structure of statistics statistics descriptive...
TRANSCRIPT
Basic Statistics
Measures of Central Tendency
STRUCTURE OF STATISTICS
STATISTICS
DESCRIPTIVE
INFERENTIAL
TABULAR
GRAPHICAL
NUMERICAL
CONFIDENCEINTERVALS
TESTS OF HYPOTHESIS
1
Consider the following distribution of scores:
2
How do the red and blue distributions differ?
How do the red and green distributions differ?
Characteristics of Distributions
• Location or Center– Can be indexed by using a measure
of central tendency
• Variability or Spread– Can be indexed by using a measure
of variability
Consider the following distributions:
How do they differ?
Consider the following two distributions:
How do the green and red distributions differ?
Characteristics of Distributions
• Location or Central Tendency
• Variability
• Symmetry
• Kurtosis
STRUCTURE OF STATISTICS
STATISTICS
DESCRIPTIVE
INFERENTIAL
TABULAR
GRAPHICAL
NUMERICAL
CONFIDENCEINTERVALS
TESTS OF HYPOTHESIS
NUMERICAL
STRUCTURE OF STATISTICSNUMERICAL DESCRIPTIVE MEASURES
DESCRIPTIVE
TABULAR
GRAPHICAL
NUMERICAL
CENTRALTENDENCY
VARIABILITY
SYMMETRY
KURTOSIS
Measures of Central Tendency
Summarizing DataThe Mean
The Median
The Mode
Give you one score or measure that represents, or is typical of, an entire
group of scores
Give you one score or measure that represents, or is typical of, an entire
group of scores
frequency
score
Most scores tend to center toward a point in the distribution.
Central Tendency
35
41
73
84
35
47
56
35
52
39
35
84
69 7
7354
7 92
35
33
43
47
65
39
90
49
35
67
The Mean
The Median
The Mode
Frequency
Tables
Graphs
Measures of Central Tendency
49
52
41
84
52
41
Measurement scales
Frequency Tables & Graphs
Averaging
52
43
47
47
Tabulating
Graphing
Measures of Central Tendency
Are statistics that describe typical, average, or representative scores.
The most common measures of central tendency (mean,median, and mode) are quite different in conception and calculation.
These three statistics reflect different notions of the “center” of a distribution.
“The Mode”The score that occurs most frequently
In case of ungrouped frequency distribution
When observations have been grouped into classes, the midpoint of the class with the largest frequency is used as an estimate of the mode.
The mode of this distribution is estimated to be 52, the midpoint of the 51-53 class
In case of grouped frequency distribution
Unimodal Distribution -One Mode-
Bimodal Distribution –Two Modes-
Mode and Measurement Scales
1 2 1 3 3 2 3 3 3 1 2 1 2 3 3 2 1 2 3 2
1 2 3 4 4 3 4 3 2 4 4 2 1 2 4 4 3 2 3 4
112 132 112 113 112 150 125 114
68 56 39 56 44 56 45 56 75 81 67 59
Nationality
1=American
2=Asian
3=Mexican
Football Poll
1=first
2=second
3=third
4=fourth
IQ score Weight
Can you find a mode for each data?
3 4 112 56
Nominal Scale Ordinal Scale Interval Scale Ratio Scale
It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.
It can be found for ratio-level, interval-level, ordinal-level and nominal-level data
“The Mode”
The Median is the 50th percentile of a distribution
- The point where half of the observations fall below and half of the observations fall above
In any distribution there will always be an equal number of cases above and below the Median.
“The Median”
Location
Oh my !!Where is the
median?
For an odd number of untied scores (11, 13, 18, 19, 20)
11 12 13 14 15 16 17 18 19 20
The Median is the middle score when scores are arranged in rank order
Median Location = (N+1)/2 = 3rd
Median Score = 18
For an even number of untied scores (11, 15, 19, 20)
11 12 13 14 15 16 17 18 19 20
The Median is halfway between the two central values when scores are arranged in rank order
Md score=(15+19)/2=17
Median Location = (N+1)/2 = 2.5th
The Median of group of scores is that point on the number line such that sum of the distances of all scores to that point is smaller than the sum of the distances to any other point.
There is a unique median for each data set.
It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.
The Median can be computed for
•Ordinal-level data, or•Interval-level data, or•Ratio-level data.
Median and Levels of Measurement
1 2 1 3 3 2 3 3 3 1 2 1 2 3 3 2 1 2 3 2
1 2 3 4 4 3 4 3 2 4 4 2 1 2 4 4 3 2 3 4
112 132 112 113 112 150 125 114
68 56 39 56 44 56 45 56 75 81 67 59
Nationality Football Poll IQ score Weight
Can you find a median for each type of data?
No
Yes Yes Yes
The Mean
XN
XN
DefinitionDefinition:: For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values. To compute the population mean, use the following formula.
Population mean
Population mean
Sigma
Population size
Population size
Individual valueIndividual value
Definition: Definition: For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values. To compute the sample mean, use the following formula.
THE SAMPLE MEANTHE SAMPLE MEAN
X XnX-bar
Sigma
Individual value
Sample Size
Characteristics of The Mean
Center of Gravity of a Distribution
Center of Gravity of a Distribution
1 2 3 4 5 6 7 8
Mean
Data set
25 27
3129
35 37
33
31
31 31
31 31
31
31 31
How much error do you expect for each case?
The Mean6
-6
-2
4
-4
0 2
Deviation Scores
On average, I feel fine
It’s too cold!
It’s too hot!
The Mean of group of scores is the point on the number line such that sum of the squared differences between the scores and the mean is smaller than the sum of the squared difference to any other point. If you summed the differences without squaring them, the result would be zero.
Mean and Measurement Scales
Every set of interval-level and ratio-level data has a mean.
1 2 3
Nationality
1=American
2=Asian
3=Mexican
Nominal data
1 2 3 1 2 3 1 2 3
IQ Test
Ordinal data Interval data Ratio data
Football Poll1=first2=second3=third
Weight
2 2 2 2NO YES YESNO
All the values are included in computing the mean.
X Xn
A set of data has a unique mean and the mean is affected by unusually
large or small data values.
3 5 7 9
The Mean
1
5
1
654
9
55.5
3
5
• Every set of interval-level and ratio-level data has a mean.
• All the values are included in computing the mean.
• A set of data has a unique mean.• The mean is affected by unusually large or
small data values.• The arithmetic mean is the only measure of
central tendency where the sum of the deviations of each value from the mean is zero.
The Relationships between Measures of Central Tendency
and Shape of a Distribution
Symmetric Unimodal
Normal Distribution
Mean=Median=Mode
Positively Skewed Distribution
Mode < Median < Mean
Mode
Median
Mean
The median falls closer to the mean than to the mode
With unimodal curves of moderate asymmetry, the distance from the median to the mode is approximately twice that of the distance between the median and
the mean
Negatively Skewed Distribution
Mode > Median > Mean
Mode
Median
Mean
The median falls closer to the mean than to the mode
Bimodal Distribution
Mean=Median
Mode Mode
Mode1 < Mean=Median < Mode2
If two averages of a moderately skewed frequency distribution are known, the third can be approximated.
The formulae are:
Mode = Mean - 3(Mean - Median)
Mean = [3(Median) - Mode]/ 2
Median = [2(Mean) + Mode]/ 3
Measures of Central Tendency as Inferential Statistics
Parameters
Statistics
Sampling
Mean Median Mode
Mean Median Mode
Difference Between Parameter and Statistics
Sampling Errors
As inferential measures, the Mean will be used much more frequently than the Median or Mode.
Why ?
On the average, there is less sampling error associated with the Mean than with the Median, and the Mode tends to have more sampling error than the Median. In other words, the difference between the statistic X and the Mean tends to be less than for the corresponding values for the sample Median (Md) and population median (Mdpop).
SUMMARY
There are three common measures of central tendency. The mean is the most widely used and the most precise for inferential purposes and is the foundation for statistical concepts that will be introduced in subsequent class. The mean is the ratio of the sum of the observations to the number of observations. The value of the men is influenced by the value of every score in a distribution. Consequently, in skewed distributions it is drawn toward the elongated tail more than is the median or mode.
The median is the 50th percentile of a distribution. It is the point in a distribution from which the sum of the absolute differences of all scores are at a minimum. In perfectly symmetrical distributions the median and mean have same value. When the mean and median differ greatly, the median is usually the most meaningful measure of central tendency for descriptive purposes.
The mode, unlike the mean and median, has descriptive meaning even with nominal scales of measurement. The mode is the most frequently occurring observation. When the median or mean is applicable, the mode is the least useful measure of central tendency. In symmetrical unimodal distribution the mode, median, and mean have the same value.