statistics trinity college
TRANSCRIPT
StatisticsStacy Cater
Question 1
11 31 18 13 11 3 1 1 6 1 4
4 - 6hours
6.5 hours
Histogram
Is chosen to represent “continuous numerical data”. That is data that represents a quantity where the numbers can take on any value in a certain range.
Distribution of Data
Positively Skewered
Distribution
Also known as a skewered right distribution.
Negatively Skewered
Distribution
Also known as a skewered left distribution.
Symmetric Distribution
If the values smaller and
larger than its midpoint are
mirror images of each other
Question 2
Standard Deviation
Two classes took a recent test. There were 10 students in each class, and each class had an average score of 81.5%
Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?
The answer is… No.
The average (mean) does not tell us anything about the distribution or variation in the grades.
Here are Dot-Plots of the grades in
each class:
Mean
So, we need to come up with some way of measuring not just
the average, but also the spread of the distribution of our
data.
Why not just give an average and the
range of data (the highest and lowest values) to describe
the distribution of the data?
Well, for example, lets say from a set
of data, the average is 17.95
and the range is 23.But what if the data looked like
this:
Here is the average
And here is the range
But really, most of the
numbers are in this area, and are not evenly
distributed throughout the
range.
The Standard Deviation is a number that measures how far away each number in a set of data is from
their mean.
If the Standard Deviation is large, it means the
numbers are spread out from their mean.
If the Standard Deviation is small, it means the numbers are close to
their mean.
Here are the scores on the math
test for Team
A:
72
76
80
80
81
83
84
85
85
89
Average: 81.5
The Standard Deviation measures how far away each number in a set of data is from their mean.
For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?
72 - 81.5 = - 9.5
- 9.5
- 9.5
Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?
89 - 81.5 = 7.5
7.5
So, the first step to finding
the Standard Deviation is to find all the
distances from the mean.
72
76
80
80
81
83
84
85
85
89
-9.5
7.5
Distance from Mean
So, the first step to finding
the Standard Deviation is to find all the
distances from the mean.
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
Next, you need to square each of
the distances
to turn them all
into positive
numbers
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
Distances Squared
Next, you need to square each of
the distances
to turn them all
into positive
numbers
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Add up all of the
distances
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
Divide by (n - 1)
where n represents the amount of numbers you have.
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
Finally, take the Square Root of
the average distance
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
= 4.88
This is the Standard Deviation
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
= 4.88
Now find the
Standard Deviation
for the other class
grades
57
65
83
94
95
96
98
93
71
63
- 24.5
- 16.5
1.5
12.5
13.5
14.5
16.5
11.5
- 10.5
-18.5
Distance from Mean
600.25
272.25
2.25
156.25
182.25
210.25
272.25
132.25
110.25
342.25
Distances Squared
Sum:2280.5(10 - 1)
= 253.4
= 15.91
Now, lets compare the two classes
againTeam A Team B
Average on the Test
Standard Deviation
81.5 81.5
4.88 15.91
You have to be able to calculate
standard deviation using your calculator!
Try!
Try using the
scores for
Team A:
72
76
80
80
81
83
84
85
85
89
ANS: 4.88
Note:
Measures of central tendency (mean, mode&median) and variability are known as SUMMARY STATISTICS.
Question 3
Solution
93.725360
X 3601
= 824 schools
Try Some Questions
2011 Paper - Q 7 (i)
Q7 (ii)
Q 7 (b) (i)
Q 7 (c)
Say Bye to Univariate& Hello to
Bivariate Data
Two variables
Tied or paired together
Two - dimensional data
Bivariate Data
Deals with causes or relationships
The major purpose of bivariate analysis is to determine whether relationships exist.
Each observation is composed of..
National Institutes of Health (NIH)
Sedentary activities (like Tv watching) are associated with an increase in obesity and an increase in the risk of diabetes in women.
Anger expression may be inversely related to the risk of heart attack and stroke. (Those who express anger may have a decreased risk).
Light to moderate drinking reduces the risk of heart disease in men.
News Reporters love to tell stories about the latest links!
Such as..
Does having her first baby later in life cause a woman to live longer? (New York Times)
‘Count Cricket Chirps to Gauge Temperature’
(Garden Gate)
What you have to do!
1. find a cricket2. count the number of times it chirps in 15 seconds
3. add 40
You’ve just predicted the temp. in degrees Fahrenfeit!
No. of Chirps in 15 sec
Temperature (in degrees Fahrenheit)
18 57
20 60
21 64
23 65
27 68
30 71
34 74
39 77
Table 18-1 Cricket Chirps and Temperature Data (Excerpt)
Lets see another example!
A Press Release by Ohio State University Medical Center
The headline says that...
“aspirin can prevent polyps in colon cancer patients”
Raw Data for this Study
ID NO. 22292 GROUP=ASPIRIN DEVELOPED POLYPS=NO
(635 LINES)
Table 18-2 Summary of Aspirin v’s Polyps Study Results
Group % Developing Polyps*
Aspirin 17
Non-aspirin 27*total sample size = 635 (approx were half randomly assigned to each person)
Scatter Plots
Bivariate Numerical Data
Two Dimensions
Horizontal dimension (x-axis)
Vertical dimension (y-axis)
Scatter Plot of cricket chirps versus outdoor temperature.
Interpreting a Scatterplot
you do this by looking for trends in the data as you go from left to right.
Positive linear relationship
Proportional relationship
As x increases (moves right one unit), y increases (moves up) a certain amount.
Negative linear relationship
Inverse relationship
As x increases, y decreases (moves down) a certain amount.
If the data don’t seem to resemble any kind of line (even a vague one) this means that no linear relationship exists.
Positive Linear Relationship
as the cricket chirps increase so does the temperature aswell.
Example
Age of Car
Value of Car (£)
Quantifying the Relationship
Quantify or measure the extent and nature of the relationship.
We have already seen how to measure the direction of a linear relationship BUT you will also have to decide on the STRENGTH
of the relationsbip!!
Introduce the...
Correlation Coefficient
Measures the strength and direction of the linear relationship between x and y (or
the vertical and horizontal dimension).
Calculating the C.C.
It is represented by the letter r
It has a value between - 1 and 1
You only have to be able to calculate it using your calculator-luckily for you!
If r is close to 1, then there is a strong positive correlation between two sets of data.
If r is close to -1, we say there is a strong negative correlation between the two sets.
If r is close to 0, then there is no correlation between the two sets.
Most statisticians like to see correlations above = 0.6 or below - 0.6.
Types of Correlation
It is important you state the Direction and the Strength of a Correlation
Correlation Coefficient = 0.99 Correlation coefficient = 0.5
A positive correlation means that high values of one variable are associated with high values of a second variable. The relationship between height and weight, between IQ scores and achievement test scores, and between self-concept and grades are examples of positive correlation.
Correlation Coefficient = - 0.99 Correlation Coefficient = - 0.5
A negative correlation or relationship means that high values of one variable are associated with low values of a second variable. Examples of negative correlations include those between exercise and heart failure, between successful test performance and feelings of incompetence, and between absence from school and school achievement.
No CORRELATION
Correlation Coefficient = -.16
Scatter Plot of cricket chirps versus outdoor temperature.
Correlation of 0.98!
Correlation versus
Causation
The amount of fuel burned by a car depends on the size of its engine, since bigger engines burn more petrol. We say there is a CASUAL RELATIONSHIP between the amount of petrol used and the size of the cars engine.
If two variables are found to be either associated or correlated, that doesn’t necessarily mean that a cause-and-effect relationship exists between the two variables.
If we find a statistical relationship between two variables, then we cannot always conclude that one of the variables is the cause of the other, i.e. correlation does not always imply causality.
During 1980 and 2000 there was a large increase in sales of calculators and computers!
There was a strong positive correlation between the sales of computers and the sales of calculators!
For Example..
Did the increase of sales of calculators cause an increase in the sale of
computers??
NO!!!!
Production Costs Decreased
Cost of Production was a third variable causing the other two to
increase.
We call this third variable a LURKING VARIABLE.
Linear RegressionLine of Best Fit
After you’ve found a relationship between two variables
and you have some way of quantifying this relationship, you can create a model that allows you to use one variable to predict
another.
1. Draw a Scatter Plot.2. If graph suggests a linear relationship..3. Calculate Correlation Coefficient.4. Find the equation of the Line that best fits the data.
- We draw this by eye, and then find its equation.
Because you have a strong correlation be it positive or negative you know that x is correlated with y.
If you know the slope and the y-intercept of that line, then you can plug in a value for x and predict the
average value for y.
In other words, you can predict y from x.
You should never do a regression analysis unless you’ve already found a strong correlation (either pos. or neg.)
between the two variables!
Now Calculate Line!
Equation: y = mx + c
M = slope y2-y2/x2-x1 where (x1,y1) and (x2,y2) are points on the line of best fit.
Substitute the m and one point into y-y1=m(x-x1).
0.98
Let’s Sum up!
Types of Sampling
Populations and Samples
Types of Sampling
Bias in Sampling
Reliability of Data
Collecting Data
Frequency Tables
Stem-and-Leaf Diagram
Back-to Back S & L
Histograms
Distribution of Data
Scatter Graph
Correlation
Correlation Coefficient
Causality
Linear Regression
2011 paper 2 Q 2
2013 paper 2 Q 7
1st= run 2nd= cycle 3rd=swim
25 mins
3.17 mins
no modal time but modal class.
2012 paper 2 Q 7