statistics trinity college

StatisticsStacy Cater

Question 1

11 31 18 13 11 3 1 1 6 1 4

4 - 6hours

6.5 hours

Histogram

Is chosen to represent “continuous numerical data”. That is data that represents a quantity where the numbers can take on any value in a certain range.

Distribution of Data

Positively Skewered

Distribution

Also known as a skewered right distribution.

Negatively Skewered

Distribution

Also known as a skewered left distribution.

Symmetric Distribution

If the values smaller and

larger than its midpoint are

mirror images of each other

Question 2

Standard Deviation

Two classes took a recent test. There were 10 students in each class, and each class had an average score of 81.5%

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?

The answer is… No.

The average (mean) does not tell us anything about the distribution or variation in the grades.

Here are Dot-Plots of the grades in

each class:

So, we need to come up with some way of measuring not just

the average, but also the spread of the distribution of our

data.

Why not just give an average and the

range of data (the highest and lowest values) to describe

the distribution of the data?

Well, for example, lets say from a set

of data, the average is 17.95

and the range is 23.But what if the data looked like

this:

Here is the average

And here is the range

But really, most of the

numbers are in this area, and are not evenly

distributed throughout the

range.

The Standard Deviation is a number that measures how far away each number in a set of data is from

their mean.

If the Standard Deviation is large, it means the

numbers are spread out from their mean.

If the Standard Deviation is small, it means the numbers are close to

their mean.

Here are the scores on the math

test for Team

A:

72

76

80

80

81

83

84

85

85

89

Average: 81.5

The Standard Deviation measures how far away each number in a set of data is from their mean.

For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?

72 - 81.5 = - 9.5

- 9.5

- 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?

89 - 81.5 = 7.5

7.5

So, the first step to finding

the Standard Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

-9.5

7.5

Distance from Mean

So, the first step to finding

the Standard Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

Next, you need to square each of

the distances

to turn them all

into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

Distances Squared

Next, you need to square each of

the distances

to turn them all

into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Add up all of the

distances

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

Divide by (n - 1)

where n represents the amount of numbers you have.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

Finally, take the Square Root of

the average distance

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

This is the Standard Deviation

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

Now find the

Standard Deviation

for the other class

grades

57

65

83

94

95

96

98

93

71

63

- 24.5

- 16.5

1.5

12.5

13.5

14.5

16.5

11.5

- 10.5

-18.5

Distance from Mean

600.25

272.25

2.25

156.25

182.25

210.25

272.25

132.25

110.25

342.25

Distances Squared

Sum:2280.5(10 - 1)

= 253.4

= 15.91

Now, lets compare the two classes

againTeam A Team B

Average on the Test

Standard Deviation

81.5 81.5

4.88 15.91

You have to be able to calculate

standard deviation using your calculator!

Try!

Try using the

scores for

Team A:

72

76

80

80

81

83

84

85

85

89

ANS: 4.88

Note:

Measures of central tendency (mean, mode&median) and variability are known as SUMMARY STATISTICS.

Question 3

Solution

93.725360

X 3601

= 824 schools

Try Some Questions

2011 Paper - Q 7 (i)

Q7 (ii)

Q 7 (b) (i)

Q 7 (c)

Say Bye to Univariate& Hello to

Bivariate Data

Two variables

Tied or paired together

Two - dimensional data

Bivariate Data

Deals with causes or relationships

The major purpose of bivariate analysis is to determine whether relationships exist.

Each observation is composed of..

National Institutes of Health (NIH)

Sedentary activities (like Tv watching) are associated with an increase in obesity and an increase in the risk of diabetes in women.

Anger expression may be inversely related to the risk of heart attack and stroke. (Those who express anger may have a decreased risk).

Light to moderate drinking reduces the risk of heart disease in men.

News Reporters love to tell stories about the latest links!

Such as..

Does having her first baby later in life cause a woman to live longer? (New York Times)

‘Count Cricket Chirps to Gauge Temperature’

(Garden Gate)

What you have to do!

1. find a cricket2. count the number of times it chirps in 15 seconds

3. add 40

You’ve just predicted the temp. in degrees Fahrenfeit!

No. of Chirps in 15 sec

Temperature (in degrees Fahrenheit)

18 57

20 60

21 64

23 65

27 68

30 71

34 74

39 77

Table 18-1 Cricket Chirps and Temperature Data (Excerpt)

Lets see another example!

A Press Release by Ohio State University Medical Center

The headline says that...

“aspirin can prevent polyps in colon cancer patients”

Raw Data for this Study

ID NO. 22292 GROUP=ASPIRIN DEVELOPED POLYPS=NO

(635 LINES)

Table 18-2 Summary of Aspirin v’s Polyps Study Results

Group % Developing Polyps*

Aspirin 17

Non-aspirin 27*total sample size = 635 (approx were half randomly assigned to each person)

Scatter Plots

Bivariate Numerical Data

Two Dimensions

Horizontal dimension (x-axis)

Vertical dimension (y-axis)

Scatter Plot of cricket chirps versus outdoor temperature.

Interpreting a Scatterplot

you do this by looking for trends in the data as you go from left to right.

Positive linear relationship

Proportional relationship

As x increases (moves right one unit), y increases (moves up) a certain amount.

Negative linear relationship

Inverse relationship

As x increases, y decreases (moves down) a certain amount.

If the data don’t seem to resemble any kind of line (even a vague one) this means that no linear relationship exists.

Positive Linear Relationship

as the cricket chirps increase so does the temperature aswell.

Example

Age of Car

Value of Car (£)

Quantifying the Relationship

Quantify or measure the extent and nature of the relationship.

We have already seen how to measure the direction of a linear relationship BUT you will also have to decide on the STRENGTH

of the relationsbip!!

Introduce the...

Correlation Coefficient

Measures the strength and direction of the linear relationship between x and y (or

the vertical and horizontal dimension).

Calculating the C.C.

It is represented by the letter r

It has a value between - 1 and 1

You only have to be able to calculate it using your calculator-luckily for you!

If r is close to 1, then there is a strong positive correlation between two sets of data.

If r is close to -1, we say there is a strong negative correlation between the two sets.

If r is close to 0, then there is no correlation between the two sets.

Most statisticians like to see correlations above = 0.6 or below - 0.6.

Types of Correlation

It is important you state the Direction and the Strength of a Correlation

Correlation Coefficient = 0.99 Correlation coefficient = 0.5

A positive correlation means that high values of one variable are associated with high values of a second variable. The relationship between height and weight, between IQ scores and achievement test scores, and between self-concept and grades are examples of positive correlation.

Correlation Coefficient = - 0.99 Correlation Coefficient = - 0.5

A negative correlation or relationship means that high values of one variable are associated with low values of a second variable. Examples of negative correlations include those between exercise and heart failure, between successful test performance and feelings of incompetence, and between absence from school and school achievement.

No CORRELATION

Correlation Coefficient = -.16

Scatter Plot of cricket chirps versus outdoor temperature.

Correlation of 0.98!

Correlation versus

Causation

The amount of fuel burned by a car depends on the size of its engine, since bigger engines burn more petrol. We say there is a CASUAL RELATIONSHIP between the amount of petrol used and the size of the cars engine.

If two variables are found to be either associated or correlated, that doesn’t necessarily mean that a cause-and-effect relationship exists between the two variables.

If we find a statistical relationship between two variables, then we cannot always conclude that one of the variables is the cause of the other, i.e. correlation does not always imply causality.

During 1980 and 2000 there was a large increase in sales of calculators and computers!

There was a strong positive correlation between the sales of computers and the sales of calculators!

For Example..

Did the increase of sales of calculators cause an increase in the sale of

computers??

NO!!!!

Production Costs Decreased

Cost of Production was a third variable causing the other two to

increase.

We call this third variable a LURKING VARIABLE.

Linear RegressionLine of Best Fit

After you’ve found a relationship between two variables

and you have some way of quantifying this relationship, you can create a model that allows you to use one variable to predict

another.

1. Draw a Scatter Plot.2. If graph suggests a linear relationship..3. Calculate Correlation Coefficient.4. Find the equation of the Line that best fits the data.

- We draw this by eye, and then find its equation.

Because you have a strong correlation be it positive or negative you know that x is correlated with y.

If you know the slope and the y-intercept of that line, then you can plug in a value for x and predict the

average value for y.

In other words, you can predict y from x.

You should never do a regression analysis unless you’ve already found a strong correlation (either pos. or neg.)

between the two variables!

Now Calculate Line!

Equation: y = mx + c

M = slope y2-y2/x2-x1 where (x1,y1) and (x2,y2) are points on the line of best fit.

Substitute the m and one point into y-y1=m(x-x1).

Let’s Sum up!

Types of Sampling

Populations and Samples

Types of Sampling

Bias in Sampling

Reliability of Data

Collecting Data

Frequency Tables

Stem-and-Leaf Diagram

Back-to Back S & L

Histograms

Distribution of Data

Scatter Graph

Correlation

Correlation Coefficient

Causality

Linear Regression

2011 paper 2 Q 2

2013 paper 2 Q 7

1st= run 2nd= cycle 3rd=swim

25 mins

3.17 mins

no modal time but modal class.

2012 paper 2 Q 7

statistics trinity college

Education

distribution of data

range of data

set of data

average mean

average distance

standard deviation measures

standard deviation4

continuous numerical