bivariate data as 90425 (3 credits) complete a statistical investigation involving bi-variate data

34
Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi- variate data

Upload: peregrine-franklin

Post on 14-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Bivariate Data

AS 90425 (3 credits)

Complete a statistical investigation involving bi-variate data

Page 2: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Scatter Plots

The scatter plot is the basic tool used to investigate relationships between two quantitative variables.

Types of Variables

Quantitative(measurements/

counts)

Qualitative(groups)

Page 3: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I see in these scatter plots?

There appears to be a linear trend.

There appears to be moderate constant scatter about the trend line.

Negative Association.

No outliers or groupings visible.

454035

20

19

18

17

16

15

14

Latitude (°S)   

Mean January Air Temperatures for 30 New Zealand Locations

Tem

pera

ture

(°C

)

Page 4: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I see in these scatter plots?

There appears to be a non-linear trend.

There appears to be non-constant scatter about the trend line.

Positive Association. One possible outlier

(Large GDP, low % Internet Users).

0 10 20 30 40

GDP per capita (thousands of dollars)

0

10

20

30

40

50

60

70

80

Inte

rnet

Use

rs (

%)

% of population who are Internet Users vs GDP per capita for 202 Countries

Page 5: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I see in these scatter plots?

Two non-linear trends (Male and Female).

Very little scatter about the trend lines

Negative association until about 1970, then a positive association.

Gap in the data collection (Second World War).Year

1990198019701960195019401930

30

28

26

24

22

20

Ag

e

Average Age New Zealanders are First Married

Page 6: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Trend Do you see

a linear trend… straight line

OR

a non-linear trend?

Page 7: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Trend Do you see

a linear trend… straight line

OR

a non-linear trend?

Page 8: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Trend Do you see

a positive association… as one variable gets bigger, so does the other

OR a negative association?

as one variable gets bigger, the other gets smaller

Page 9: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Trend Do you see

a positive association… as one variable gets bigger, so does the other

OR a negative association?

as one variable gets bigger, the other gets smaller

Page 10: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Scatter Do you see

a strong relationship… little scatter

OR

a weak relationship? lots of scatter

Page 11: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Scatter Do you see

a strong relationship… little scatter

OR

a weak relationship? lots of scatter

Page 12: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Scatter Do you see

constant scatter… roughly the same amount of scatter as you look across the plot

or non-constant scatter?

the scatter looks like a “fan” or “funnel”

Page 13: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Scatter Do you see

constant scatter… roughly the same amount of scatter as you look across the plot

or non-constant scatter?

the scatter looks like a “fan” or “funnel”

Page 14: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Anything unusual Do you see

any outliers? unusually far from the trend

any groupings?

Page 15: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I look for in scatter plots?

Anything unusual Do you see

any outliers? unusually far from the trend

any groupings?

Page 16: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Rank these relationships from weakest (1) to strongest (4):

Page 17: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Rank these relationships from weakest (1) to strongest (4):

2

1

4

3

Page 18: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Rank these relationships from weakest (1) to strongest (4):

How did you make your decisions? The less scatter there is about the

trend line, the stronger the relationship is.

Page 19: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Describing scatterplots

Trend – linear or non linear Trend – positive or negative Scatter – strong or weak Scatter – constant or non constant Anything unusual – outliers or

groupings

Page 20: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Correlation

Correlation measures the strength of the linear association between two quantitative variables

Get the correlation coefficient (r) from your calculator or computer

r has a value between -1 and +1: Correlation has no units

Page 21: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Calculating the coefficient of correlation (r)

The coefficient of correlation (r) is given by the following formula:

2 2

( )( )

( ) ( )

x x y yr

x x y y

Fortunately you will have Excel do all the work for you!

Page 22: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What can go wrong?

Use correlation only if you have two quantitative variables (variables that can be measured)

There is an association between gender and weight but there isn’t a correlation between gender and weight! Gender is not quantitative

Use correlation only if the relationship is linear

Beware of outliers!

Page 23: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Deceptive situations

r is a measure of how one variable varies in a linear relation to the other

An obvious pattern does not always indicate a high value of the coefficient of correlation (r)

A horizontal or vertical trend indicates that there is no relationship, and so r is (close to) zero

Page 24: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Always plot the data before looking at the correlation!

r = 0 No linear relationship, but

there is a relationship!

r = 0.9 No linear relationship, but

there is a relationship!

Page 25: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

2007

For example: Rata's bean sprouts: r = 0

Minutes spent on measuring the plants10

Height of sprout

mm

10

20

30

Page 26: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

2007

For example: Jenny's bean sprouts: r = 0

Soil depth (cm)2 4 6

Height of sprout

mm

10

20

Page 27: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Tick the plots where it would be OK to use a correlation coefficient to describe the strength of the relationship:

9876543210

4000300020001000

0

Position Number

Dis

tan

ce (

mill

ion

mile

s)

Distances of Planets from the SunReaction Times (seconds)

for 30 Year 10 Students

0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.8 1

Non-dominant Hand

Dom

inan

t H

an

d

454035

20

19

18

17

16

15

14

Latitude (°S)   

Mean January Air Temperatures for 30 New Zealand Locations

Tem

pera

ture

(°C

)

Female ($)

Average Weekly Income for Employed New Zealanders in 2001

Male

($

)

0

200

400

600

800

1000

1200

0 200

400

600

800

Page 28: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

Tick the plots where it would be OK to use a correlation coefficient to describe the strength of the relationship:

9876543210

4000300020001000

0

Position Number

Dis

tan

ce (

mill

ion

mile

s)

Distances of Planets from the SunReaction Times (seconds)

for 30 Year 10 Students

0

0.2

0.4

0.6

0.8

0 0.2 0.4 0.6 0.8 1

Non-dominant Hand

Dom

inan

t H

an

d

454035

20

19

18

17

16

15

14

Latitude (°S)   

Mean January Air Temperatures for 30 New Zealand Locations

Tem

pera

ture

(°C

)

Female ($)

Average Weekly Income for Employed New Zealanders in 2001

Male

($

)

0

200

400

600

800

1000

1200

0 200

400

600

800

Not linear

Remove two outliers, nothing happening

Page 29: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I see in this scatter plot?

Appears to be a linear trend, with a possible outlier (tall person with a small foot size.)

Appears to be constant scatter.

Positive association.

22 23 24 25 26 27 28 29

150

160

170

180

190

200

Foot size (cm)

Heig

ht

(cm

)

Height and Foot Size for 30 Year 10 Students

Page 30: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What will happen to the correlation coefficient if the tallest Year 10 student is removed?

It will get smaller

It won’t change

It will get bigger22 23 24 25 26 27 28 29

150

160

170

180

190

200

Foot size (cm)

Heig

ht

(cm

)

Height and Foot Size for 30 Year 10 Students

Page 31: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What will happen to the correlation coefficient if the tallest Year 10 student is removed?

It will get bigger22 23 24 25 26 27 28 29

150

160

170

180

190

200

Foot size (cm)

Heig

ht

(cm

)

Height and Foot Size for 30 Year 10 Students

Page 32: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What do I see in this scatter plot?

Appears to be a strong linear trend.

Outlier in X (the elephant).

Appears to be constant scatter.

Positive association.

6005004003002001000

40

30

20

10

Gestation (Days)

Life

Exp

ect

an

cy (

Years

)

Life Expectancies and Gestation Period for a sample of non-human Mammals

Elephant

Page 33: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What will happen to the correlation

coefficient if the elephant is removed?

It will get smaller

It won’t change

It will get bigger6005004003002001000

40

30

20

10

Gestation (Days)

Life

Exp

ect

an

cy (

Years

)

Life Expectancies and Gestation Period for a sample of non-human Mammals

Elephant

Page 34: Bivariate Data AS 90425 (3 credits) Complete a statistical investigation involving bi-variate data

What will happen to the correlation

coefficient if the elephant is removed?

It will get smaller

6005004003002001000

40

30

20

10

Gestation (Days)

Life

Exp

ect

an

cy (

Years

)

Life Expectancies and Gestation Period for a sample of non-human Mammals

Elephant